I made two versions of some model, one in which I explicitly have a categorical variable and another in which it is marginalised. It gave some tension between the final result of my two models (below one sigma, so nothing I was really worried about). First I thought it was just a Monte-Carlo error because sampling discrete variables is difficult, but it persisted. Eventually I reproduced the issue with a minimal example, here it is:
toydata = np.ones(10)
nonmargtoy_model = pm.Model() with nonmargtoy_model: w=pm.Dirichlet('w', a=np.ones(5),shape=5) t=pm.Categorical('t',p=w,testval=1) Y_obs = pm.Normal("Y_obs",mu=t,sigma=1,observed=toydata) nonmargtoy_trace = pm.sample()
margtoy_model = pm.Model() with margtoy_model: w=pm.Dirichlet('w', a=np.ones(5),shape=5) #t=pm.Categorical('t',p=w,testval=1) Ndists = [ pm.Normal.dist(mu=t,sigma=1) for t in range(5) ] Y_obs = pm.Mixture("Y_obs",w=w,comp_dists=Ndists,observed=toydata) margtoy_trace = pm.sample()
Do you agree that the two models are the same modulo a marginalisation and should give the same w posterior distribution?
Here are the results for the non-marginalised:
and the marginalised:
Both have some preference for 1 (although in my more complicated model the preferred value does change a bit) but it’s not the same strength at all. The non-marginalised version seems to be wrong: w gives a weaker preference to 1, staying quite close to its prior despite my data being 1 everywhere, and it doesn’t select 0 and 2 as more likely than 3 or 4.
What am I missing?
I tried several samplers. In particular Metropolis, in case there was an issue with the automatic differentiation to get the force of the HMC (since w only impact the likelihood through t, I am not sure how these derivatives could be computed). It didn’t change anything.
This is on PyMC3, v3.11.4, and I’m not willing to update it right now because apart from this detail the project is quite advanced and I was about to submit a paper in the next few days.