I am playing around with the great example in Simpson’s paradox and mixed models — PyMC example gallery
and I noticed the following:
i) Increasing the number of groups from 5 to 50 makes it necessary to use pm.sample(target_accept=0.999, tune=2000, draws=10000)
in order to not get any warnings about divergences, too large rhat
etc.
Is there a upper limit on the number of groups that one can handle in general with pymc
- in the sense that sampling “doesn’t get stuck” necessitating larger and larger acceptance probabilities and draws?
ii) My posterior expectation for the population level slope (slope_mu
hyperprior in the hierarchical model from the example) seems to not converge to the true value - 0.5, but instead seems to systematically overestimate it - the posterior is distributed around -0.45. If I increase the number of observations per each group (say e.g. by a factor of ten), then the posterior peaks around 0.5 as expected. Intuitively I would also have expected a “similar” effect when increasing the number of groups (by a factor of ten) as the total number of observations that enters the population level slope estimate is the same. Is there an intuitive explanation why increasing samples per group and number of groups has different effects on the posterior of the population level parameters? Is this caused by the choice of the hyperprior parameters (sigma=1)?