ADVI start with initialization

Hey community!

I built a probabilistic model and I want to investigate the convergence property of latent parameter every N steps. The procedure can be summarized as follows:

import pymc3 as pm
means = None
for dummy_i in xrange(10):
    with my_model:
        (means, std, elbos) = pm.variational.advi(n=N, start=means)
    check_means(means) 

I intend to track the model performance with iterations. However, I found it is relying on the step length N. In my case, when N=5000, I found that the model performance( AUC) is fluctuating at around 0.6. N =10000, the model performance is approaching 1 and the final model performance is around 0.9. It seems that stopping advi and continuing with the means of parameters gets a different result.
So my problem is how can I stop iteration and continue?
Thanks in advance.

1 Like

The advi API you are using is a bit dated, if you update to master and use the new API, you can do the following:

with pm.Model() as model:
    # define your model
    ... 
    # set up the advi inference
    inference = pm.ADVI()

# train your model
approx = inference.fit(n=10000)

Now you have an approximation class approx, which you can check the history of elbo, and also the meanfield approximation mu and sd

elbos1 = -approx.hist # shape = (10000,)
# get the fitted value
gbij = approx.bij
cov = approx.cov.eval()
means = gbij.rmap(approx.mean.eval())
sds = gbij.rmap(np.diag(cov)**.5)

You can now check the fit by ploting the elbo or using the means and sds, if you want to further train the model, just do:

inference.fit(n=20000)

and again do

elbos1 = -approx.hist # shape = (10000,) + (20000,) = (30000,)
# get the fitted value
gbij = approx.bij
cov = approx.cov.eval()
means = gbij.rmap(approx.mean.eval())
sds = gbij.rmap(np.diag(cov)**.5)

to get the updated means an sds

2 Likes

Thanks for your detailed instruction @junpenglao . Very helpful! However I still have some problem.
I defined my model by following codes:

import pymc3 as pm
# y is  a list of input matrices. and a, b alpha are parameters.
y = input_mats
a = args.a
b = args.b
alpha = 2
N = y[0].shape[0]
n_source = len(y)
dim = args.k
M = [y[c].shape[1] for c in range(n_source)]
prior_mu = np.zeros([N, dim])
thetas = {}
with pm.Model() as model:
    sigma = pm.InverseGamma('sigma', alpha=alpha, shape=n_source)
    x = pm.Laplace('x', mu=prior_mu, b=b, shape=(N, dim))
    for c in xrange(n_source):
        thetas['theta'+str(c)] = pm.Dirichlet('theta'+str(c), 
                                        a=(a / dim) * np.ones((M[c], dim)), shape=(M[c], dim))
    for c in xrange(n_source):
        pm.Normal('y'+str(c), mu=theano.tensor.dot(x, thetas['theta'+str(c)].T), 
                  sd=pm.math.sqrt(sigma[c]) * np.ones((N, M[c])),  observed=y[c])
    inference = pm.ADVI()

I can use ADVI by the old interface:

with model:
    means, std, elbos = pm.variational.advi(n=10000)

This works fine for me. However, when I use the new API:

approx = inference.fit(n=10000)

It takes a very long time to start iteration.


Here you can see that inference.fit() takes much longer time (40 mins) to start iteration.
ps: My OS is ubuntu 16.04 and I use anaconda python 2.7 and cuda-8.0.

The two for loop is likely inefficient. You should try to rewrite it.

There might be ways to check which part of the initialization is slow using some profiling, @ferrine?

Also, regarding to your original goal (i.e., checking the convergence property of latent parameter every N steps), you can write your own callback function.

Alternatively, you can copy the old variational API and change the code in L181-L201 to check the convergence internally (the old API will be removed soon).

I suppose theano does graph optimizations and that takes a lot of time

Thanks for your advice. The parameter n_source is often a small number, e.g. 2 or 3.
I profile those two line using %%prun in ipython.
The results are as follows:

%%prun 
with model:
    means, std, elbos = pm.variational.advi(n=10000)

The output file is like:


New API:

approx = inference.fit(n=10000)

The output is:


It seems pickle.py and {repr} take the majority of running time.

Try to do theta.append() and indexing it with theta[c] instead of thetas['theta'+str(c)]

I tried, but it still doesn’t work. I modified the old variational API as you suggested. Thanks :slight_smile: