I’m trying to build a changepoint detection model where both the number of changepoints is sampled *as well as* their respective locations. From what I understand, this would mean the number of switch statements within the defined model is dynamic, depending on the number of changepoints sampled. So far, this kind of hierarchical model can’t be expressed with PyMC. Am I correct, or is there a way to express this model in PyMC?

**I have no idea whether this would work**, but you could try to create a vector of change point parameters of length N, where you have a discrete prior over N. And you’d probably want those change points to be ordered.

@fsosa I built a model which allows for an unknown number of changepoints. It includes discrete latent variables, so NUTS can’t be used for the whole thing and as a consequence the inference requires many more iterations than normal. The basic probability model is a linear Gaussian model in which each period has a distinct mean value and a single variance parameter for all periods. You’ll also need to specify an upper bound on the possible number of changepoints which may be much larger than the number you reasonably expect to find.

Some of the tricks used here include sorting / ordering the changepoints like @drbenvincent mentioned as well as using a shrinkage prior on the number of active changepoints.

You can run the model as a Colab notebook here. While it gets many thousands of divergences during sampling, it does get at least a few samples which don’t diverge!

Neat! Although that many divergences is a massive red flag.

I’ve only had time to scan it, but a few thoughts.

- I wondering @ricardoV94 or @lucianopaz could let us know how legit it is to put a prior on a structural component of the model (e.g. the length of another parameter vector
- You can’t get gradients because there’s an implicit step function at each change point. So if you model the changes with a continuous function (like a sigmoid) then you can get gradient information.
- I’m not convinced that this is hierarchical. More like “multiple” changepoints.
- I’d be tempted to use a Dirichlet distribution over the changepoints and just scale them to the x extent of your data. That way they are innately ordered.

So I made a 2nd pass (same Colab notebook) on the model and got rid of the binary latent variables by trying out a new representation for active/inactive changepoints with a sigmoid function, but not as a replacement for the > operator. No divergences now!

Yes that’s valid (as in you will get the right logp/dlogp), but we don’t have any samplers that expect RV sizes to change across iterations, and xarray would cry as well. Allow step methods to work with variables that have dynamic sizes · Issue #5139 · pymc-devs/pymc · GitHub

This model was really asking me to try the new experimental functionality to marginalize discrete variables: Automatic marginalization of discrete variables in the logp by ricardoV94 · Pull Request #91 · pymc-devs/pymcx · GitHub

Marginalizing over the number of changepoints seems to really help NUTS, and the posteriors look somewhat sensible. Here is the notebook: multiple_changepoint_marginalization.ipynb · GitHub

Really appreciate feedback, specially if I missed something obvious. You will have to checkout that PR branch locally if you want to try it out.

Wow, that is awesome! There’s probably a little label switching going on hence a few high \hat{R} values but the results look spot on. This is very exciting.

Oh of course… I should sort the variables before checking their r_hat

Hey @ckrapu, I ran the code from your notebook as it is, but got the following plot instead. Do you have any idea what may be causing this?

I just reran it one more time and got the same results as when I originally executed the notebook. Perhaps it’s a random seed issue - do you want to try running it again? To make sure we’re on the same version, I named the notebook checkpoint “v1.0.0” in Colab.