How to extract the variational posterior expected --matrix-- of regression weights from a categorical model?

Hi,

I have a model of type bambi.models.Model with Family name categorical and link name softmax. For this model, learned regression weights can be expressed as a matrix of size MxK, where M is the number of covariates and K is the number of categories that were observed in the training data. (With an identifiability constraint imposed, one column of this matrix is fixed to 0, so we could represent this as an Mx(K-1) matrix; however I will ignore this detail for the sake of this discussion.)

I have then used pymc3’s mean field ADVI to fit this model, giving me an object of type pymc3.variational.approximations.MeanField. If I call that object model_fitted_by_ADVI, then I can extract the posterior expected regression weights via model_fitted_by_ADVI.mean.eval(), but those regression weights take the form of a flat vector, so it is unclear how to reconstruct the weight matrix from this vector. I do not know if the entries of the vector represent the matrix unfolded by row, by column, or by some other mechanism.

What is the recommended way to reconstruct the regression weight matrix from the flat vector (or to access it directly)?

Thank you kindly in advance for any guidance that you might be able to share.

Hi!

Could you share a snippet that allows reproducing the situation you describe? That would be really helpful so I can test it on my side before trying to give an answer.

Thanks!

1 Like

Sure thing.


import bambi as bmb 
import pandas as pd 

# Get some data for a categorical regression
url_to_iris_data = "https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv"
df = pd.read_csv(url_to_iris_data)
df = df.astype({"species": "category"})

# Construct bambi model 
formula_as_string = "species ~ sepal_length + sepal_width + petal_length + petal_width"
model= bmb.Model(formula_as_string, df, family="categorical")

# Fit advi 
advi = model.fit(method="advi")
model_fitted = advi.fit(10000)

# Extract information about variational family after optimization  
expected_beta_as_flat_vector =  model_fitted.mean.eval()


My question: how to convert beta from a vector to a matrix?!
There are M=4 covariates and K=3 categories, so with an intercept term, there should be K*(M+1)=15 regression coefficients. We can represent this as a (M+1) x K matrix. Under an identification constraint, one column will be set to the zero vector, leaving (M+1) x (K-1) = 10 entries. expected_beta_as_a_flat_vector gives us those
ten entries, but it is unclear how to reconstruct the (M+1) x K matrix from them.

Hi ashman,

I came across this problem a while back too. What worked for me was to first run ADVI:

res = pm.fit(method='advi', n=100000)

And then use the following code to get the mean and standard deviation dictionaries:

means_pymc3 = res.params[0].eval()
sds_pymc3 = res.std.eval()

pymc3_mean_dict = res.bij.rmap(means_pymc3)
pymc3_sd_dict = res.bij.rmap(sds_pymc3)

Hope this helps, let me know.

3 Likes

Thanks, Martin. That is very helpful!

I can almost reconstruct the beta matrix from this. The only remaining piece, now, is to determine which category has gotten assigned to have the zero vector for its regression weights on each covariate (and intercept). Is there a way to do this programmatically?

(So I’m now hoping to un-ignore the detail I initially said I’d ignore for the sake of this discussion :grin: )

It’s been a while since I used Bambi but I think the first level was the one that was first alphabetically. I may be wrong though — probably a question for @tcapretto !

1 Like

There are a couple of places where you can look for the information.

The Model object has an attribute called response, which represents the response of the model.

model.response
# ResponseTerm(name: species, shape: (150,), levels: ['setosa', 'versicolor', 'virginica'], baseline: setosa)

There you can see the levels of the response variable are 'setosa', 'versicolor' and virginica, and the level taken as baseline, i.e. with 0 coefficients, is 'setosa'.

You can access this information programmatically

model.response.levels
# ['setosa', 'versicolor', 'virginica']

and then

model.response.baseline
# ['setosa']

You can also see the PyMC coordinates associated with this response term

model.response.pymc_coords
# {'species_coord': ['versicolor', 'virginica']}

and since there’s no 'setosa' you can conclude it is the one being removed.

An additional note

For the moment, Bambi takes the first level as reference. If I recall correctly, we can’t change it with Bambi (at least easily). You can always change the order of the categorical series and Bambi will choose the first one.

I want to keep improving this functionality, so this is subject to change in future releases.

Let me know if you have any other questions :smiley:

3 Likes

I’m not familiar with ADVI, but there are a couple of things I’m worried about.

  1. The results from ADVI and the results from the default NUTS are quite different (if I look at the posterior means/sds). Are these numbers supposed to match?
  2. I’m almost sure the coefficient for the intercept is not right. We use a centered version of the data when creating the underlying PyMC model and we adjust the Intercept once we have the posterior samples. Unfortunately, we only do that for MCMC, but we don’t do it for ADVI.

I’m sorry but since I’m not familiar with ADVI I couldn’t take care of the issues associated with its implementation. I think there are a couple of things I can take care of but I can’t do it in the immediate future.

Edit

This is where we rescale the intercept.

2 Likes

Just a quick comment: @tcapretto when you say ADVI looks off, is that with 10000 iterations (10^4)? If so, I’d also try much larger values, e.g. 10^5 at least. In my experience, 10^4 is rarely enough, and 10^5 can be much better. For a simple model like this I would have thought ADVI might not be so bad. Also, I’ve found it very important to scale the data.