Combining traces for different samplers

I am running MCMC for an inverse problem which involves a call to an expensive black box simulation at each sampling step.

  • Since the only way to compute the gradients for NUTS is with finite difference with respect to the inference parameters, NUTS is prohibitively expensive to run by itself.
  • Metropolis does not require a gradient, and explores the posterior space well locally, but not globally. It’s about a factor of 100 cheaper to run than NUTS per iteration
  • I would like to sample using metropolis for most of the iterations, and every so often do a few NUTS or HMC iterations to better traverse the typical set

I can create a compound step involving 100 MH iterations and 1 NUTS iteration, for example, but pm.sample will only save the final parameter value at the end of the 101 steps to the trace. How can I keep the intermediate Metropolis steps as well? I don’t want to restart the trace each time, because I want to take advantage of all the tuning done for both samplers up to that point.

From reading the documentation, it’s unclear if I can use a MultiTrace object to keep the tuning information from the past – I don’t want to start a brand new trace every time and then manually link them unless I can keep the tuning information!

I don’t have a specific solution for the problem you’ve presented regarding the tuning information and trace objects, but I will add my two cents on using NUTS versus MH since I’ve also had experience hooking up PyMC3 to outside simulation code.

  • I think that the alternating NUTS / MH scheme is usually not any better than just using NUTS, even if the latter takes an enormous amount of time to draw samples. Keep in mind that while the 500 tune / 500 draw setting is the default, you may get MCMC mixing much more rapidly than that.
  • If you can spare the compute time, it may be worth just running a chain with 50/50 or even fewer to see if you think the samples are good. Remember to compare using the effective sample size rather than just the number of iterations completed.
  • In the unlikely event that your black box simulation is written in Python, you could consider getting the log posterior gradients with autodiff using Jax.
  • Do you think the forward model evaluations are cheap enough to use Sequential Monte Carlo?