Hierarchical changepoint detection

@fsosa I built a model which allows for an unknown number of changepoints. It includes discrete latent variables, so NUTS can’t be used for the whole thing and as a consequence the inference requires many more iterations than normal. The basic probability model is a linear Gaussian model in which each period has a distinct mean value and a single variance parameter for all periods. You’ll also need to specify an upper bound on the possible number of changepoints which may be much larger than the number you reasonably expect to find.

Some of the tricks used here include sorting / ordering the changepoints like @drbenvincent mentioned as well as using a shrinkage prior on the number of active changepoints.

You can run the model as a Colab notebook here. While it gets many thousands of divergences during sampling, it does get at least a few samples which don’t diverge!