What is the practical difference between Sequential Monte Carlo sampling on its own and SMC with Approximate Bayesian Computation?
My data is similar to that shown here, in that I’m modeling the signal resulting from a biochemical reaction with a mechanistic model that has four parameters (plus time), for which I have ~50 observations over time. I’m estimating those parameters with a hierarchical Bayesian model that has ~200 parameters and ~10,000 observed points, so NUTS is… slow. It also seems a bit unreasonable, and unwise, to evaluate the likelihood of each observed point independently, so the SMC-ABC approach seems well-suited.
Is there an inherent performance difference, in computation time or accuracy, between using the
Simulator method for ABC (like this) vs writing a custom likelihood (like this) or theano opt (like this)? The
Simulator seems like the right way to go to me, but I want to make sure I’m not missing some subtleties.
SMC-ABC is for those model that you can simulate “fake” observation, but cannot compute likelihood (or likelihood evaluation is too expensive). You can use summary statistics which I guess what you mean by evaluating the likelihood for all data point instead of each observed independently
In terms of performance, I would say SMC-ABC is less well-tuned like NUTS, so you should spend time to evaluate the inference output using Posterior predictive check to make sure the output makes sense.
So the ODE with manual gradients example uses the SMC sampler, but not the ABC method. The post alludes that the SMC is preferable to NUTS for parameter estimation on ODEs, any reason to think that’s not true? Also, if I have a hierarchical model on my parameters, with GPs at the top level, any reason to think SMC might not behave well?
SMC will outperform NUTS if the posterior is multimodal with very well separated barriers and it should also work fine when a model has a mix of continuous and discrete variables. But SMC will scales poorly as the number of parameters increases and/or when the geometry of the posteriors becomes “weird”. This is a direct consequence of SMC (at least the version implemented in PyMC3) not using gradient information and hence being unaware of the posterior’s geometry.
We recently changed SMC to run more than one chain (optionally in parallel) in order to make it easier to compute diagnostics. The R-hat statics and the rank-plots (az.plot_trace(trace, kind=“rank_vlines”) or az.plot_trace(trace, kind=“rank_bar”)) seems to be very good at detecting problems (even when they were not created to diagnose SMC, but MCMC). We are working on new diagnostics and ways to improve SMC, so it you get the chance of comparing NUTS vs SMC or just using SMC for some problem, we will be happy to get some feedback.
That makes sense, thanks! For fitting a single curve it behaved well, but once I tried fitting multiple curves simultaneously it got very bogged down, and the result was a bit weird. It could likely improve with more tuning on my part, but it seems like it wouldn’t really outperform NUTS. I still feel like evaluating the likelihood of each timeseries data point independently isn’t entirely appropriate, but I don’t know much about these things.
This is a slightly different implementation than I was referring to in my original post, but this paper presents an ABC-SMC approach to Bayesian design of a reaction network that I’m very interested in: https://www.pnas.org/content/108/37/15190
Thanks for the paper, I will read it. There are some SMC-ABC methods specially tailored to ODEs. It seems these methods are capable of generating better proposal distributions. I have not read those papers, but it seems the implementation in PyMC3 could accommodate those methods.