Hi @Benjamin thanks for the detailed answer! BTW, regarding this,
Please see below for a “version” of your problem, where I do not observe divergences when sampling
I am not having trouble with divergences for this simulated data set, only for my actual dataset which has ~ 700 groups. In addition, a large portion of the groups has very few observations.
The solution you provided is makes total sense. I had read the blog post from pymc labs, but it had not occurred to me that it could be as simple as a new model that used sample_posterior_predictive
to sample the original trace with the same priors and a new observed variable. I guess it is the same as the way I normally make predictions, except that we introduce new groups. In this case the entire posterior is being reused as opposed to just point estimates from posterior as priors in a second model, correct? I guess that means that for groups with few observations they would just get regressed to the group mean with some potentially minor offset depending on the number of observations, right?
Just one technical clarification… Totally makes sense that mu_g
and mu_g_new
can’t be the same name or there would like be some shape errors, but does it matter that mu
, tau
, and sigma
are the same name? Would it be better to use pm.flat
to make certain that those variables are not resampled but rather grabbed directly from the posterior? I guess either is valid according to this, but just wanted to double check.