How to debug a model when most chains sample fine, but a few do not?

I have a rather complex model that samples well most of the time. But sometimes one of the chains samples very differently from the others. For example, here is the plot_trace for one of the RVs in one of the sampling runs:

The chains look pretty consistent. \hat{R} is 1.0; ESS is 607 from 2400 samples.

In another sampling run with the same model and the same data, the plot_trace for the same RV looks like this:

As you can see, one of the chains has a very different idea about the likely value for that RV, shifted more than 1.0 to the right and also with more variance. Unsurprising the sampling statistics are awful. \hat{R} is 1.52; ESS is 7.

Is this common or unusual? How to debug this? How to determine why this is happening, and adjust my model?

And do I need to run a lot more chains to detect this problem? I was under the impression that 4 chains was enough, but perhaps I have been living under a rock, as Christian Luhmann suggests.

Try running a large number of chains. On top of tens or something. It’s easy to be fooled by just 4 chains.

Multi-modiality usually means two or more distinct configurations of the parameter space are equally plausible. If you don’t believe one (some) of them you can try to encode that at the prior level.

Some models may also have natural symmetries, like mixture models. Those can be tackled (again) by priors that arbitrarily restrict the sampler to one half of the symmetry.

1 Like