How to parametrize random effects in Nested ANOVA

I am having some troubles at replicating example 1 in this tutorial. Briefly, measurements (S4DUGES) were collected in two SEASONs at 6 different SITEs (3 in WINTER and 3 in SUMMER).

I get several divergences. However, the estimation for the fixed/random effects and the family (unexplained) variance sigma are close enough. I feel like I am not parametrizing the random effects correctly. One thing I am not sure I am doing correctly is how to model that 3 sites belongs to WINTER and 3 sites belong to SUMMER. This is to ensure that the

[…] standard deviation between sites is within seasons (not standard deviation between all Sites) […]

I tried to replicate the STAN Hierarchical and Matrix parametrization in the tutorial (there are several implementations in the tutorial, slightly different between each other). There are also some missing value (0s) in the dataset which I did not removed (in order to make the results comparable).

Could you please have a look at my notebook (here)? Do you have any suggestions on how to improve the model?

You should remove the step=pm.NUTS() in your sampling call. Doing so inhibited the auto-tuning of the NUTS sampler (currently the default implemented in PyMC3 is a variation of the one in NUTS where we adapted the mass matrix for the transition kernel), which leads to suboptimal transition/step size that leads to divergence.

Also FYI: I have this WIP project of fitting mixed effect model in python using PyMC3, PyStan and more: https://github.com/junpenglao/GLMM-in-Python

1 Like

I tried to remove step=pm.NUTS(). Did not know about this trick. The results, however, do not change much. Still many divergences (btw, now that I look at the graph in the tutorial closer, it seems that the author had some divergent chains as well).

I knew about you repository. I got some inspiration from there as well :slight_smile:

Use the non-centred version and increase the target_accept in the sampling seems to reduce the divergence quite a lot:

trace = pm.sample(draws=5000, tune=1500, nuts_kwargs=dict(target_accept=.99))

However, I am still seeing consistently 1 or 2 divergences even with target_accept=.99, I think the problem is the Cauchy distribution for sigma. This is now no longer recommended. For example, quoting Dan Simpson from a recent post:
http://andrewgelman.com/2018/04/03/justify-my-love/

There are a number of priors that decay away from the base model and give us some form of containment. Here is an incomplete, but popular, list of simple containment priors:

A half-Gaussian on the standard deviation
An exponential on the standard deviation
A half-t with 7 degrees of freedom on the standard deviation
A half-t with 3 degrees of freedom on the standard deviation
A half-Cauchy on the standard deviation.

Try using the following code to diagnose where the divergence is coming from:

pm.pairplot(trace, varnames=['beta_X', 'sigma_Z',
                             'sigma', 'gamma_Z'], divergences=True, alpha=.01);
tracedf = pm.trace_to_dataframe(trace)
divergent = trace['diverging']

_, ax = plt.subplots(1, 1, figsize=(15, 4))
ax.plot(tracedf.values[divergent == 0].T, color='k', alpha=.01)
ax.plot(tracedf.values[divergent == 1].T, color='C2', lw=.5)
labels = tracedf.columns
ax.set_xticks(np.arange(len(labels)))
ax.set_xticklabels(labels, rotation = 45, ha="right");

And adjust the prior to put more weight on the more probable area (i.e., cut away the tail of the prior where the coefficient is unlikely to be observed).

This is all great info, thank you @junpenglao. As you suggested–and looking at the plots–I tweaked some parameters to reduce the standard deviations (especially for sigma_Z and beta_X) and increased the acceptance target. This reduced the divergences quite a bit. I still get some rare ones. I sort forgot this nice notebook on the PyMC3 website on how to diagnose a model. Good read, I should have read it more carefully before. I am surprised though on how many tricky problems I faced doing this simple exercise…

The problem really is sigma_Z. You can see that the posterior is mostly lower than .5.
I would put a prior of sigma_Z = pm.HalfNormal('sigma_Z', .5).

Exactly. I set sigma_Z = pm.HalfNormal('sigma_Z', 1) and it seems to work well. Thank you again!