Using Dirichlet instead of Categorical for ADVI

@junpenglao
Hi,

I have this doubt regarding usage of ADVI with pymc3. If I have a scenario where I have a prior distribution which is Categorical, ADVI can’t be implemented since this is not differentiable. If I replace it with Dirichlet, I am not getting the expected result.

For example:

import pymc3 as pm
import numpy as np
alpha = np.ones((1,4))
with pm.Model() as model:
    user_community = pm.Dirichlet('user_comm', a = alpha, shape = (1,4))
    #user_community = pm.Categorical('user_comm', p = alpha, shape = (1,4))

with model:
    tr = pm.sample(chains = 1)

For Dirichlet, I am getting output as:

tr[‘user_comm’][:1]
Out[43]: array([[[0.50208397, 0.07212837, 0.07184347, 0.3539442 ]]])

But if I have K groups to cluster in , I expected it to give each element of dimension (1,K) dirichlet distributed because when i use Categorical it gives me the group selected, for example:

tr[‘user_comm’][:1]
Out[46]: array([[[1, 2, 0, 0]]])

Here I can understand this could be group numbers unlike what dirichlet is giving me. I cannot conclude which group it belongs to as it gives me just a row whose values sum upto 1.

Any suggestions much appreciated.

Thank you.

Yes that’s right.

Your way of writing down the model is a bit odd. first categorical returns a discrete label, so it is one dimension less than the parameter. So to avoid confusion, you should rewrite it as:

n, K = 4, 4
with pm.Model() as m:
    user_community = pm.Categorical('user_comm', p = alpha, shape = (n,1))

and

n, K = 4, 4
with pm.Model() as m:
    user_community = pm.Dirichlet('user_comm', a = alpha, shape = (n, K))

In another word, the following is wrong:

If you have K group and n repeated observation, and you want to categorize each observation into one group, expect the Categorical return a (n, ) vector with each element being (0, 1, …, K), whereas Dirichlet return a (n, K) vector that each row sum to 1.

As for using Dirichlet, you can do user_community.argmax(axis=1), but again, this is more a hack, a proper treatment is to rewrite it into a (marginalized) mixture model.

2 Likes

Thanks for your reply.

So in that case what would be a better option to replace categorical prior into for carrying out ADVI? Even if I write a function marginalizing out the latent variable. That latent variable should have a continuous prior.

Yes (plus you cannot have discrete RV in ADVI). And yes in marginalized model all variables are continuous.