Dependence of convergence on random seed

I’m trying to fit an hierarchical model to data (~10 features, ~500 data points) and I’ve encountered something I’m not sure how to handle. After specifying the priors and the structure of the model, I’ve sampled the model and was very pleased with the results - posterior distributions “made sense” and model diagnostics were also pretty good. However… when I tried to reproduce the result, I failed - sometimes the chains of the fitted models did not converge at all, sometimes they converged but hadn’t mixed, etc…

I played around with using different seeds and indeed it seems like for some seeds I’m getting “what I want” and for some seeds I’m not. Is there any “best practice” for such a situation? Manually finding “a seed that works” feels like bad practice… :slight_smile:

Any tips/refs/links about how to approach this would be very appreciated.

Usually I will analysis the cases where it does not work:

  • lots of divergence. solution: more informative prior
  • some chain is a flat line. solution: likely initial start point problem, manually supply start
  • bad mixing (random walk). solution: this is a tricky one… likely the mass matrix is adapted to a local geometric, manually start point might help. otherwise, you might want to try smarter mass matrix adaptation like the one in

Thanks @junpenglao!

Most of the times the problem is bad mixing (it has to be the tricky case, obviously). I’ve tried using DFM’s helper function and tune the off-diagonal entires, but I’m actually getting worse results.

My current understanding is that there are “multiple possible solutions”, and the bad mixing is an indication of the sampler finding different solutions from different starting points - is this a plausible explanation? Some of these solutions don’t make much sense; to constrain the search space some more (in addition to “tightening” the priors) I’m trying to add order constraints using pm.Potential('a',X-Y). Other ideas on how to improve mixing would be much appreciated. :slight_smile:

Likely that’s the case.
Do you have normalization in your model like x/x.sum()? if so you should change them into better parameterization.
Order constraints are better done using transform.Order

Indeed I have some sort of normalization! A slightly different kind of, though:

normalize = (1-s)*(1-Lh)+s*(1-Ls)
events = pm.Poisson('events',(x_1*(1-s)*(1-Lh)+x_2*s*(1-Ls))*exposure/normalize)

May I ask what was the hint that suggested the normalization? What’s happening in the code above is that I’m approximating a sum of two binomials with probabilities x_1 and x_2, and different sample sizes using a Poisson distribution. Is there a better way to parametrize this, or avoid the normalization?

Regarding transform.Order - I’ve tried to find an example and found this one, in which rates is a 3-numbers vector, constrained to be ordered (if I understood the example correctly). I have a slightly different situation, in which x_1 and x_2 are coming from two different (upstream) GLMs, and I want to contrain x_1>x_2. Is this possible?