Extracting variable means from ADVI fitted model

I’d like to know how to extract the estimated means from a model that was fit using advi. In the documentation, it’s shown like this
fitted_model.mean.eval()
Which returns a flat 1-D array with all the estimated means, and playing around with the object properties, I find other similar methods that can return the same 1-D array.
However, I’m wondering in what order the variables in there are inserted when fitting some model with many different variables.
My first thought would be that it would put the variables in the order they are defined, but if I try to slice the array like that to extract the means, then they don’t end up matching – I checked it by comparing it to the mean of sampled examples:

samples = fitted_model.sample()
X_mean = np.mean(np.dstack([s[‘X’] for s in samples]), axis=1)

Is there any function to return the estimated means in a dictionary like the sample method does?

2 Likes

In a PyMC3 model there is a property to map dict to array and vice versa under
model.bijection. A similar property is in the approximation class, for example, you can do:

    approx = pm.fit()
    # dict
    means_dict = approx.bij.rmap(approx.params[0].eval())
    # array
    sample_array = approx.bij.map(approx.sample(1))
2 Likes

Thanks. Now another question: the dict contains variables with names ending in _log, and I tried a model where I defined variables from non-negative distributions and got negative numbers there, so I’m guessing I should take e^value there, but after I do that, the numbers still don’t match with what I get from a sample mean. Do I need to perform some other transformation?

Doing exp(a_log__) is the correct way, what is this sample mean you comparing with?

I mean comparing with taking the mean of the samples returned by approx.sample() - i.e. it doesn’t match with

np.exp(approx.bij.rmap(approx.params[0].eval()['variable_of_interest']))

However, they seem to match a bit better if I try 2^value instead of e^value.

EDIT: sorry, that 2^value was just a coincidence, didn’t end up matching in further experiments. Nevertheless, np.exp(<bij.rmap means>) and approx.sample() values still don’t match.

They are not the same. approx.sample() is sampling from the approximation with parameters in approx.param. For example, if you use ADVI then calling approx.sample() is sample from a MvNormal with the mean and std stored in approx.param

Sorry, maybe I posed the question incorrectly. Let’s say I want some method to extract the means for each variable from which approx.sample() generates samples, ideally returned in a dictionary, so as to match what I would get from taking the mean of
[sample[‘variable_of_interest’] for sample in approx.sample()]

If you want to extract the means for each variable from the samples returned by approx.sample(), what you are doing is correct. And when you take a lot of samples, this mean should be really close to the mean extracted from approx.params.

Been playing with it, and it works in some experiments when the approximated variables are Normal. However, in this example (hierarchical poisson factorization) I keep getting different numbers (results get even more different when using real data):

nentries = 1000
nrows = 100
ncols = 100
latent_dim = 3

np.random.seed(1)
x = np.round(np.random.exponential(20, size=nentries),0).astype('float32')
row_id = np.random.randint(nrows, size=nentries)
col_id = np.random.randint(ncols, size=nentries)

with pm.Model():
    user_activity=pm.Gamma('user_activity', .3, .3, shape = (nrows,1) )
    user_prior=theano.tensor.tile(user_activity, latent_dim, ndim=2)
    theta=pm.Gamma('theta', .3, user_prior, shape=(nrows,latent_dim))
    
    item_pop=pm.Gamma('item_pop', .3, .3, shape = (ncols,1) )
    item_prior=theano.tensor.tile(item_pop, latent_dim, ndim=2)
    beta=pm.Gamma('beta', .3, item_prior, shape=(ncols,latent_dim))
    
    xhat=theano.tensor.sum(theta[row_id] * beta[col_id], axis=1)
    R=pm.Poisson('R',mu=xhat, observed=x)
    HPF=pm.fit()

Then:
np.mean(np.dstack([sample['theta'] for sample in HPF.sample(10000)]), axis=2)-np.exp(HPF.bij.rmap(HPF.params[0].eval())['theta_log__'])

Is that the expected behavior? (all the numbers from approx.sample() are higher than from approx.mean.eval())

You are right, I was not clear on my side. The correct way to do is:

trace = HPF.sample(10000)
plt.plot(np.log(trace['theta']).mean(0).flatten())
plt.plot(HPF.bij.rmap(HPF.params[0].eval())['theta_log__'].flatten());

Notice that, the mean of log(theta) is equal to theta_log__ from the approximation, but not the other way around. The reason being that the expecation of a function over x E(f(x)) is not equal to the function of the expectation f(E(x)).

You can check that on each sample, the forward and backward is equally valid:

trace = HPF.sample(1)
np.testing.assert_almost_equal(trace['theta_log__'], np.log(trace['theta']))
np.testing.assert_almost_equal(np.exp(trace['theta_log__']), trace['theta'])

But this is not the case when you take the expectation/mean over all the values.
Hope this is now clearer :wink:

Oh right! Realized my mistake in the reasoning, thanks for your help!

I have similar problem. I am running the below code:
advi_fit = pm.fit(method=pm.ADVI(), callbacks=[CheckParametersConvergence()])

When I run below commands to get mean and standard deviation:
means_dict = advi_fit.bij.rmap(advi_fit.params[0].eval()) and means_dict = advi_fit.bij.rmap(advi_fit.params[1].eval())

I see some variables renamed by adding _lowerbound__ , for example- alpha_lowerbound__ and some have kept their naming structure.

I am not sure what the lowerbound means here and also most of the standard deviation values are negative. I used truncated normal with lower bound = 0, for those values mean value is less than zero. I think I lack an understanding of the output of the code. Ideally, I am looking to get the mean and standard deviation values of the variables.