Hi,
I am working in a 3level hierarchical model, which uses the offset technique described in here and the structure of DataBozo
The model looks like this:
It seems that I am having issues with the sampler getting stuck in some areas. I wonder if the problem could lie on either 1. the size of the problem (which contains information about 400.000 observations), or 2. the parametrization of the standard deviation across every level of the hierarchy (which I am not taking into account in the model).
The model parametarizes the mean of the slope (m) and intercept (b) at every level J with information of the level J1. Would you suggest to make the standard deviation of every slope (m) and intercept (b) at the level J to be dependent on the standard deviation of every slope and intercept of the level J1? If so, what would be the best way to implement this? thanks!
These are the results of the traceplot. This might give a better idea of the problems observed.
 You shouldnâ€™t initialize the NUTS sampler yourself  it will turn off the tuning during the initialization.
Could you try:
trace = pm.sample(1000, tune=2000)

Uniform prior for sigma
is usually not a good idea  HalfNormal are usually better choice.

You might also want to experiment with different parameterization of design matrix, see eg: https://github.com/junpenglao/GLMMinPython/blob/master/Playground.py#L33L40
Thanks for the tips. I am implementing 1 and 2 for now. I will post the results when ready. In the mean time, What is your opinion about making the sigmas also follow a hierarchical structure? I wonder if this could also help to shrink the deviation of every subgroup to the deviation of the parent group.
In my experience, you often dont have enough information to estimate hierarchical sigma. I remember there is a blog post from Andrew Gelman on the difficulty of estimating population variance (in real life you almost never have enough groups to do that), but I cannot find the blog post right now. I did find these might be related:
https://statmodeling.stat.columbia.edu/2015/11/07/priorsforcoefficientsandvarianceparametersinmultilevelregression/
https://projecteuclid.org/download/pdf_1/euclid.ba/1340371048
Actually a search found more result:
https://statmodeling.stat.columbia.edu/?s=hierarchical+variance+
I guess I need to catch up on my readingâ€¦
This is an area where the Horseshoe prior can serve as a good default. We covered the topic in â€śDefault Bayesian analysis with globallocal shrinkage priorsâ€ť.
Oddly, the Stan wiki on this topic only mentions the Horseshoe briefly in passing, andâ€”when it doesâ€”links to a paper that doesnâ€™t really address its utility as a default prior.
1 Like
So here are the results. It seems the sampler gets stuck after tuning 2000 and running about 5000 samples. @junpenglao I wonder if this would this be an issue with the parameterization, the random seed or the max_tree depth?
@brandonwillard thanks for the input on the Horshoe pior. Let me do a few more experiments in the meantime with the parameterization in pymc3
most likely it is still issue with the model parameterization.
Have you had any progress on this? I am trying to reparameterize a 3 level hierarchical model at the moment and Iâ€™m not sure how to do it without losing hierarchical connections. Like its possible to create an indepenedent Z value for each of the features, but then there is no hierarchy.