Variational method in pymc

jsh980210 · February 7, 2023, 7:34pm

with pm.Model() as logistic_model:
        
    beta_0=pm.Normal('beta_0', 0, 4)
    beta_1=pm.Normal('beta_1', 0, 4)
    beta_2=pm.Normal('beta_2', 0, 4)   
    
    feature_1 = pm.Data("feature_1", value = X_train['feature_1'], mutable = True)
    feature_2 = pm.Data("feature_2", value = X_train['feature_2'], mutable = True)
    label = pm.Data("label", value = y_train, mutable = True)
    
    observed=pm.Bernoulli("binary_label", pm.math.sigmoid(beta_0 + beta_1 * feature_1 + beta_2 * feature_2), observed = label)

with logistic_model:
    mean_field = pm.fit(n=100000, method='advi')
    trace = mean_field.sample(2000)
    az.plot_trace(trace)

Here I am making a pymc regression model, and I am using varational methods to sample the coefficients. I want to ask whether variational methods could get multiple sample chains, like NUTS do. In this plot I got 1 chain for each parameter.

ckrapu · February 8, 2023, 9:57pm

Your question is a reasonable one, but it turns out the answer to this:

I want to ask whether variational methods could get multiple sample chains, like NUTS do.

is a little more complicated.

In short, there are no “chains” for the mean-field variational inference (MFVI) method that PyMC uses. The reason that they appear when using the No-U-Turn sampler is that NUTS is a member of the class of Markov chain Monte Carlo methods in which the Markov chains are indeed sequences of autocorrelated values. MFVI does not use autocorrelated values, hence no chains.

The variational inference method works differently - it works by first assuming a (potentially multivariate) gaussian approximation to the posterior, followed by many steps of optimizing the parameters of the approximation. Once that is done, we generate samples by drawing from that multivariate Gaussian. It turns out that we can draw those samples independently without using a sequence of intermediate sample values like with MCMC. Thus, to get 1000 independent samples under the MFVI approximation, we simply draw 1000 multivariate gaussian samples. To do the same with NUTS, we would typically need to sample well over 1000 actual draws in order to get the equivalent of 1000 independent samples. There would be no point in getting multiple “chains” for MFVI because all the MFVI samples are independent to begin with, hence we can avoid the notion of sequential correlation automatically.

What’s more, the MFVI will converge to the same local optimum regardless of starting conditions. Here’s an animation I made that shows how the VI approximation (the green blob) fits the true distribution (black contours and points) even from multiple starting points.

There are blends of variational and MCMC methods for inference explored in research, but that’s a very deep rabbit hole and they aren’t actively used much in PyMC though one notable exception is using MFVI as an initialization routine for NUTS’ starting point.

Topic		Replies	Views
Combine multiple chains Questions	9	3313	December 6, 2017
Initializing values for multiple chains v5	0	388	May 23, 2023
Chain Failure when running model Questions	1	1918	March 7, 2020
Differenc between PyMC 3.1 and PyMC 3.0 on NUTS Questions	1	1286	August 28, 2017
How to average multiple chains? v5	16	1047	April 20, 2023

Variational method in pymc

Related topics