I have a complicated model with two levels of random effects and 156.000 ids on the lowest level (total number of rows ~ 2.200.000). The stats for all parameters except for the intercept are fine (r-hat 1.00 or 1.01 on almost every one). The intercept is theoretically uninteresting. My first question is, can I ignore a bad r-hat on a parameter I’m not interested in?
My second question is how to deal with the bad r-hat. I tried to increase the sigma in the prior, but that only made it worse (I had sigma 5 previously, and that was slightly better but still bad). I have tried to find an approriate acceptance rate for this particular problem, and I use 0.5 now to increase the step size compared to what you get with the default 0.8.
Could you provide the code of your model? It may be that high autocorrelation in your intercept may be reflecting issues with other parameters or maybe the mu and sigma you defined at the end of your post may not be the ideal values for your intercept. I assume you used a Gaussian distribution? (why mu=0 and sigma=7 ?).
Sure here is a version that samples 8000 groups (out of 156.000) and runs the model with pymc Google Colab I use that to test different settings, and the full data set is rather slow to run.
I initially got the values for sigma from bambi, I don’t know how bambi derives suitable priors. One thing I know is that bambi used a non-centered parametrisation, but my pymc code is using a centered parametrisation (I think centered should be fine since there is rather much data).
A non-centred reparameterization may help more when there’s no much data, but sometimes it may be required (anyway) to achieve convergence if the geometry of the posterior is too nasty, the stan user manual has a good entry explaining the details: Stan User’s Guide
Regarding your model, I see a big issue. You are adding an extremely large number of variables to the expectation, some forming interactions as well. I didn’t check your model in detail, but unless you have very good clarity on how these variable relate (causally) to each other, you may induce unwanted spurious effects (e.g. as induced by a collider). You may want to check the McElreath’s Rethinking book (chapter 6 in the second edition, if I’m correct, or lecture 6 here). This paper is also a great source for getting a notion on this topic: https://ftp.cs.ucla.edu/pub/stat_ser/r493.pdf
Personally, I’d try to build up the model incrementally (even starting from simulated data), to understand whether it’s actually answering the question I want to ask. Maybe you can start with some prior predictive checks to better understand and tune your priors, if you don’t have good prior information that can help you to define them. See here: Prior and Posterior Predictive Checks — PyMC 4.1.4 documentation
Indeed there are many variables in the model, and quite a few three-way interactions, so yes, the model is very complicated. But I think I need that level of complexity to answer the research questions. Basically, it concerns how wealth and culture (here “religion”) affects the probability of educational deprivation for boys and for girls (and thus the gender gap in education), and wealth and culture each have separate measurements for the household level, the community level (which has a random intercept) and the national level (which has both random intercept and a random slope for the effect of gender). I’ll check up your links. The sample is from the general population taken over a period of 30 years in 80 countries.
I started by running the model in lme4, where the opportunities for mistakes is fewer, I’ll double check that the model has the same parameters and that the estimates are in the same ballpark. And simulating from the priors is probably a very good idea! (I am new to pymc so I haven’t tried that yet)
I’m glad it helps. You may be interested in Bayesian workflows as well. This is a great paper on this topic: https://arxiv.org/pdf/2011.01808.pdf . PyMC has a great example here. And I have a much humbler/simpler example here. Good luck with the modelling.
One more thing, you were worried about the large number of fixed effect parameters (and their interactions), and I’ll remove all fixed effects except for the intercept and see if the problem with the intercept remains.
That may or may not be affecting convergence. But the main issue that would concern me is not so much the large number of parameters, but the large number of variables in the expectation. Their dependencies may be creating spurious associations which may distort the estimate or over/under-estimate uncertainty. I don’t know how your variables relate to each other, but it’s usually a good idea to have a very good theoretical and practical notion of how variables may associate to each other (i.e. do variable x has an effect on variable z and variable y (or vice-versa) and what type of effect? Is it a chain, a fork, a collider? If so, is it a good idea or a bad idea to condition on z if I want to know the effect of x on y?, etc.). You’ll find more info on these topics in the links I shared on my firs reply.