How to parametrize random effects in Nested ANOVA

Jack_Caster · April 6, 2018, 8:15am

I am having some troubles at replicating example 1 in this tutorial. Briefly, measurements (S4DUGES) were collected in two SEASONs at 6 different SITEs (3 in WINTER and 3 in SUMMER).

I get several divergences. However, the estimation for the fixed/random effects and the family (unexplained) variance sigma are close enough. I feel like I am not parametrizing the random effects correctly. One thing I am not sure I am doing correctly is how to model that 3 sites belongs to WINTER and 3 sites belong to SUMMER. This is to ensure that the

[…] standard deviation between sites is within seasons (not standard deviation between all Sites) […]

I tried to replicate the STAN Hierarchical and Matrix parametrization in the tutorial (there are several implementations in the tutorial, slightly different between each other). There are also some missing value (0s) in the dataset which I did not removed (in order to make the results comparable).

Could you please have a look at my notebook (here)? Do you have any suggestions on how to improve the model?

junpenglao · April 6, 2018, 9:45am

You should remove the step=pm.NUTS() in your sampling call. Doing so inhibited the auto-tuning of the NUTS sampler (currently the default implemented in PyMC3 is a variation of the one in NUTS where we adapted the mass matrix for the transition kernel), which leads to suboptimal transition/step size that leads to divergence.

Also FYI: I have this WIP project of fitting mixed effect model in python using PyMC3, PyStan and more: https://github.com/junpenglao/GLMM-in-Python

Jack_Caster · April 6, 2018, 9:56am

I tried to remove step=pm.NUTS(). Did not know about this trick. The results, however, do not change much. Still many divergences (btw, now that I look at the graph in the tutorial closer, it seems that the author had some divergent chains as well).

I knew about you repository. I got some inspiration from there as well

junpenglao · April 6, 2018, 10:20am

Use the non-centred version and increase the target_accept in the sampling seems to reduce the divergence quite a lot:

trace = pm.sample(draws=5000, tune=1500, nuts_kwargs=dict(target_accept=.99))

However, I am still seeing consistently 1 or 2 divergences even with target_accept=.99, I think the problem is the Cauchy distribution for sigma. This is now no longer recommended. For example, quoting Dan Simpson from a recent post:
http://andrewgelman.com/2018/04/03/justify-my-love/

There are a number of priors that decay away from the base model and give us some form of containment. Here is an incomplete, but popular, list of simple containment priors:
A half-Gaussian on the standard deviation
An exponential on the standard deviation
A half-t with 7 degrees of freedom on the standard deviation
A half-t with 3 degrees of freedom on the standard deviation
A half-Cauchy on the standard deviation.

junpenglao · April 6, 2018, 10:42am

Try using the following code to diagnose where the divergence is coming from:

pm.pairplot(trace, varnames=['beta_X', 'sigma_Z',
                             'sigma', 'gamma_Z'], divergences=True, alpha=.01);

tracedf = pm.trace_to_dataframe(trace)
divergent = trace['diverging']

_, ax = plt.subplots(1, 1, figsize=(15, 4))
ax.plot(tracedf.values[divergent == 0].T, color='k', alpha=.01)
ax.plot(tracedf.values[divergent == 1].T, color='C2', lw=.5)
labels = tracedf.columns
ax.set_xticks(np.arange(len(labels)))
ax.set_xticklabels(labels, rotation = 45, ha="right");

And adjust the prior to put more weight on the more probable area (i.e., cut away the tail of the prior where the coefficient is unlikely to be observed).

Jack_Caster · April 6, 2018, 4:41pm

This is all great info, thank you @junpenglao. As you suggested–and looking at the plots–I tweaked some parameters to reduce the standard deviations (especially for sigma_Z and beta_X) and increased the acceptance target. This reduced the divergences quite a bit. I still get some rare ones. I sort forgot this nice notebook on the PyMC3 website on how to diagnose a model. Good read, I should have read it more carefully before. I am surprised though on how many tricky problems I faced doing this simple exercise…

junpenglao · April 6, 2018, 4:53pm

The problem really is sigma_Z. You can see that the posterior is mostly lower than .5.
I would put a prior of sigma_Z = pm.HalfNormal('sigma_Z', .5).

Jack_Caster · April 6, 2018, 5:05pm

Exactly. I set sigma_Z = pm.HalfNormal('sigma_Z', 1) and it seems to work well. Thank you again!

Topic		Replies	Views
Advice with hierarchical model Questions	9	2208	October 8, 2018
Slow speed of NUTS in Hierarchical model Questions	14	2091	October 20, 2020
There were 6 divergences after tuning. Increase `target_accept` or reparameterize Questions	5	4253	August 19, 2021
Ideas for reparameterizing models/changing priors to avoid divergences Questions	3	925	December 5, 2019
Reparameterization of 3-level hierarchical bayesian model Questions	9	2292	July 12, 2022

How to parametrize random effects in Nested ANOVA

Related topics