I am working in a 3-level hierarchical model, which uses the offset technique described in here and the structure of DataBozo
The model looks like this:
It seems that I am having issues with the sampler getting stuck in some areas. I wonder if the problem could lie on either 1. the size of the problem (which contains information about 400.000 observations), or 2. the parametrization of the standard deviation across every level of the hierarchy (which I am not taking into account in the model).
The model parametarizes the mean of the slope (m) and intercept (b) at every level J with information of the level J-1. Would you suggest to make the standard deviation of every slope (m) and intercept (b) at the level J to be dependent on the standard deviation of every slope and intercept of the level J-1? If so, what would be the best way to implement this? thanks!
These are the results of the trace-plot. This might give a better idea of the problems observed.
- You shouldn’t initialize the NUTS sampler yourself - it will turn off the tuning during the initialization.
Could you try:
trace = pm.sample(1000, tune=2000)
Uniform prior for
sigma is usually not a good idea - HalfNormal are usually better choice.
You might also want to experiment with different parameterization of design matrix, see eg: https://github.com/junpenglao/GLMM-in-Python/blob/master/Playground.py#L33-L40
Thanks for the tips. I am implementing 1 and 2 for now. I will post the results when ready. In the mean time, What is your opinion about making the sigmas also follow a hierarchical structure? I wonder if this could also help to shrink the deviation of every sub-group to the deviation of the parent group.
In my experience, you often dont have enough information to estimate hierarchical sigma. I remember there is a blog post from Andrew Gelman on the difficulty of estimating population variance (in real life you almost never have enough groups to do that), but I cannot find the blog post right now. I did find these might be related:
Actually a search found more result:
I guess I need to catch up on my reading…
This is an area where the Horseshoe prior can serve as a good default. We covered the topic in “Default Bayesian analysis with global-local shrinkage priors”.
Oddly, the Stan wiki on this topic only mentions the Horseshoe briefly in passing, and—when it does—links to a paper that doesn’t really address its utility as a default prior.
So here are the results. It seems the sampler gets stuck after tuning 2000 and running about 5000 samples. @junpenglao I wonder if this would this be an issue with the parameterization, the random seed or the max_tree depth?
@brandonwillard thanks for the input on the Horshoe pior. Let me do a few more experiments in the meantime with the parameterization in pymc3
most likely it is still issue with the model parameterization.