Modeling a Spectrum of Gaussians with SMC

Often you can do prediction for quantities of interest without having to assign clusters. For example, you can answer questions like the probability that two data items are in the same cluster, or you can do posterior predictive data generation or prediction.

The only multimodality here is whether the two modes collapse and the label switching. You should be able to prevent the former by initializing them apart. The same model collapse problem can arise in SMC.

That’s one way to go, but it causes implicit label switching and thus can be hard to adapt and sample. An alternative that would be kosher according to the “generative” intentions of PyMC would be to do something like z ~ lognormal(mu, sigma) (or some other positive-constrained prior) and then set y = cumulative_sum(z). In Stan, we just transform with cumulative_sum(exp(x)) where x is unconstrained and apply the Jacobian correction and then put a prior directly on y.

1 Like