I am currently using PyMC4. I have recently seen a behavior that I don’t understand - which is probably because I don’t know the details of how the pymc inference process works.
I have a model that usually works as expected and produces reasonable results. internally the model maintains some Normally distributed variables (actually a vector of independent Normal RVs) and various transformations, the observed data is modelled as a function of all these.
I was interested in monitoring the implied distribution of an auxiliary quantity that does not appear explicitly in the model. This quantity/variable is Normally distributed, with parameters taken from variables that are already in the model.
What happened then surprised me, so I tried a few variations - and the results seem fairly unambiguous, that things do not work the way that I thought they did.
My first thought was just to add a named pm.Normal() at the end of the model, without any observed values, and monitor this variable in the trace/inference data generated by the model. Strange things started to happen, so I tried another approach, using pm.Normal.dist() and returning values in a pm.Deterministic().
This gave three versions of the model:
1.- without tracking the auxiliary variable.
2.- including pm.Normal(‘aux’, …) - without observed data
3.- including pm.Deterministic(‘aux’, pm.Normal(…).dist())
Versions 1 and 3 have stable sampling and ‘work’ as expected, but version 2 does not. Using arviz to look at the sampling chains shows that for v2 they are not really stable, and the sampler seems to be getting confused and visiting strange parts of the parameter space. To confirm this, I ran each of the above model versions ten times, and the results were consistent.
I do not know how the pymc sampling mechanism works - but I have always assumed that it constructs a graph of all the rvs in the model, and builds a joint likelihood function working back from the observed data nodes, and then does inference based on that. So that adding an extra RV which was not on the path (or any path) between an observed RV and any intermediate/independent driver variables would not affect the behavior of the sampler.
What I am seeing here implies that my understanding about how this works is wrong. Is there anyone who knows about how all this works in more detail than me, who can shed light on this ?