@junpenglao
Hi,
I have this doubt regarding usage of ADVI with pymc3. If I have a scenario where I have a prior distribution which is Categorical, ADVI can’t be implemented since this is not differentiable. If I replace it with Dirichlet, I am not getting the expected result.
For example:
import pymc3 as pm
import numpy as np
alpha = np.ones((1,4))
with pm.Model() as model:
user_community = pm.Dirichlet('user_comm', a = alpha, shape = (1,4))
#user_community = pm.Categorical('user_comm', p = alpha, shape = (1,4))
with model:
tr = pm.sample(chains = 1)
For Dirichlet, I am getting output as:
tr[‘user_comm’][:1]
Out[43]: array([[[0.50208397, 0.07212837, 0.07184347, 0.3539442 ]]])
But if I have K groups to cluster in , I expected it to give each element of dimension (1,K) dirichlet distributed because when i use Categorical it gives me the group selected, for example:
tr[‘user_comm’][:1]
Out[46]: array([[[1, 2, 0, 0]]])
Here I can understand this could be group numbers unlike what dirichlet is giving me. I cannot conclude which group it belongs to as it gives me just a row whose values sum upto 1.
Any suggestions much appreciated.
Thank you.
Yes that’s right.
Your way of writing down the model is a bit odd. first categorical returns a discrete label, so it is one dimension less than the parameter. So to avoid confusion, you should rewrite it as:
n, K = 4, 4
with pm.Model() as m:
user_community = pm.Categorical('user_comm', p = alpha, shape = (n,1))
and
n, K = 4, 4
with pm.Model() as m:
user_community = pm.Dirichlet('user_comm', a = alpha, shape = (n, K))
In another word, the following is wrong:
If you have K group and n repeated observation, and you want to categorize each observation into one group, expect the Categorical return a (n, ) vector with each element being (0, 1, …, K), whereas Dirichlet return a (n, K) vector that each row sum to 1.
As for using Dirichlet, you can do user_community.argmax(axis=1)
, but again, this is more a hack, a proper treatment is to rewrite it into a (marginalized) mixture model.
2 Likes
Thanks for your reply.
So in that case what would be a better option to replace categorical prior into for carrying out ADVI? Even if I write a function marginalizing out the latent variable. That latent variable should have a continuous prior.
Yes (plus you cannot have discrete RV in ADVI). And yes in marginalized model all variables are continuous.