SMC Sampler references

Hi PyMC team, and in particular @ricardoV94 @aloctavodia @junpenglao who seem to have worked a lot on the SMC module

I was thrilled to find a SMC sampler implementation in a proper established package.

I was wondering, though: how close is this to SMC Samplers a-la-DelMoral, Doucet, etc?

Indeed I was surprised to see its implementation cite Ching and Chen (2007) and Minson et al. (2013), who do not use the term “SMC samplers”, “ABC SMC”, “IMH”, etc, but instead uses terms like “Transitional MCMC” and “CATMIP” which don’t appear in the code.

I come from a computational Bayesian stat background (did my PhD 15 years ago in adaptive SMC for state-space-models), and I wasn’t familiar with those articles, but much more with the Del Moral, Doucet, Jasra (2006) work on SMC samplers, and generally the bibliography in Chapter 17 of Chopin and Papaspiliopoulos (2020).

To be clear: this is really not “hey, you didn’t cite so and so from my gang” academic attribution call :laughing: This is rather a genuine “Oh, is that what I already knew? Can I be lazy and recycle my knowledge (and the theoretical guarantees etc), or is there a conceptual gap here?”. I didn’t know that other branch of research on that topic, over there in engineering and geophysics journal, and now wondering how much similar your code is to what I already know and trust, vs needing to dive into two papers I didn’t know.

Thanks for any clarification!

References:

Ching, Jianye, and Yi-Chu Chen. 2007. ‘Transitional Markov Chain Monte Carlo Method for Bayesian Model Updating, Model Class Selection, and Model Averaging’. Journal of Engineering Mechanics 133 (7): 816–32. https://doi.org/10.1061/(ASCE)0733-9399(2007)133:7(816).

Chopin, Nicolas, and Omiros Papaspiliopoulos. 2020. ‘SMC Samplers’. In An Introduction to Sequential Monte Carlo, edited by Nicolas Chopin and Omiros Papaspiliopoulos, 329–55. Springer Series in Statistics. Cham: Springer International Publishing. SMC Samplers | SpringerLink.

Del Moral, Pierre, Arnaud Doucet, and Ajay Jasra. 2006. ‘Sequential Monte Carlo Samplers’. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 68 (3): 411–36.

Minson, S. E., M. Simons, and J. L. Beck. 2013. ‘Bayesian Inversion for Finite Fault Earthquake Source Models I—Theory and Algorithm’. Geophysical Journal International 194 (3): 1701–26. Bayesian inversion for finite fault earthquake source models I—theory and algorithm | Geophysical Journal International | Oxford Academic.

Hi there!
yes the current SMC implementation is SMC sampler (not to confused with particle filter, which usually describe a different, albeit closely related, algorithm applicable in a slightly different (streaming) problem context). The PyMC code based closely on [1], which advocates the use of a specific (sub-optimal) L-kernel [2] because it uses ABC (where other L-kernels would be impossible to use). Hope this clear up a bit

[1] Del Moral, P., Doucet, A., & Jasra, A. (2012). An adaptive sequential Monte Carlo method for approximate Bayesian computation. Statistics and computing , 22 , 1009-1020.
[2] Green, P. L., Devlin, L. J., Moore, R. E., Jackson, R. J., Li, J., & Maskell, S. (2022). Increasing the efficiency of Sequential Monte Carlo samplers through the use of approximately optimal L-kernels. Mechanical Systems and Signal Processing , 162 , 108028.

1 Like

You can also take a look at 11. Appendiceal Topics — Bayesian Modeling and Computation in Python which goes into more detail. The citation we did there seems to be a bit more comprehensive as well.

1 Like

Just to add a historical note. If I remember correctly, the first implementation of a SMC into PyMC was done by someone with a geologist background (and that could explain the origin of the citations). I even think the name was Transitional MCMC or similar. Then the implementation was rewritten into something much closer to the current one and we adopted the SMC name.

Ah fantastic, thanks @junpenglao and @aloctavodia , this now all makes sense, as to why the code seems indeed much like SMC Samplers from DelMo, Doucet, etc, and why the references are to TJMCMC. And thanks for pointing to a reference for the backward L kernel!

In practice, do you recommend IMH as a forward kernel, or more local RWMH? I have quite a narrow target distribution (very correlated pairs of parameters of a nonlinear ordinary differential equation), which might exhibit funnels (hierarchical models), so I’m thinking an RWMH proposal might allow for better exploration from the few high-weight “particles” that manage to land in the funnel, while also ensuring a coverage of the rest of the broad mass.

Ideally, a mixture of IMH and RWMH would likely be the dream, maybe while keeping an auxiliary variable to make it easy to compute the weights without integration on the two components of the mixture.

Any practical guideline, please? Back in my days I was cooking my own proposal kernels in nonlinear state space models (i.e. particle filter, not SMC samplers), but that was a looong while ago.

In my experience, IMH behaves better than RWMH, and funnels can be tricky for both. With Carlos Iguaran (he is working on SMC in blackjack) we have discussed hybrid kernels, but the focus was/is on using different kernels for different variables. Like HMC for continuous variables and IHM for discrete ones. But that’s just an idea for the moment. We have not thought of a weighted mixture of kernels as you are suggesting. With Carlos, we also discussed diagnostics for SMC, but that still needs more work. In the meantime, I simply use r-hat and rankplots as a guide. But they may look totally fine even if you are missing parts of a funnel.

Ah nice idea for the per-variable kernels. Unfortunately my continuous variables are parameters of an ODE whose stiffness varies a lot based on the location in the parameter space: HMC diverges during the leapfrog integration, whenever a step takes it out of the well-behaved typical set, and there’s not much I can do (or know how to do!) to reparametrize this chemistry-based ODE. As a result, there’s HMC divergence pretty much at every point in the space, without a clear pattern (unlike a funnel).

Diagnostics for SMC would be very interesting. R-hat sorta makes sense, but given the non-locality of the SMC sampler, I suspect that the R will be pretty good as all the instances of SMC samplers will be similar behaved with a high probability even if you miss minor modes, as long as you use the same initial distribution (i.e. the prior at beta=0): they will all with high probability miss the same modes – unlike very local MCMC chains that get trapped in different places. So a good R-hat will give a false sense of confidence I’m afraid. Same for the rank test: we will have very similar distributions, I suspect.

A random idea for a diagnostic: changing the initial proposal, and possibly even the threshold on the ESS used for the adaptive tempering, so you end up with different tempering schedules, i.e. different sequences of distiributions, i.e. more difference between the parallel instances.

Btw, have you considered doing IMH and/or RWMH with a fat-tail proposal? I’m thinking replacing the Gaussian by a multivariate-t-student, using the continuous-mixture the Gaussian-Gamma decompositionused by Liu and Rubin (1995, Section 2) and Peel and McLachlan (2000, Section 3), that I used in https://arxiv.org/pdf/1108.2836.pdf [under equation 5.12] [Note: my own paper is not super the clearest, I was still young, I’m mostly putting it here for pure vanity :laughing:].

If you haven’t tried it, happy to discuss how to code it!

Peel, D., and G.J. McLachlan. 2000. ‘Robust Mixture Modelling Using the t Distribution’. Statistics and Computing 10 (4): 339–48.
Liu, C, and D B Rubin. 1995. ‘ML Estimation of the t Distribution Using EM Andits Extensions, ECM and ECME’. Statistica Sinica 5 (1): 19–39.