I want to do Bayesian Inference of parameters of a black-box simulator function. The issue is that the simulator is very slow, taking many seconds or minutes for a single evaluation of the model at some parameters. Limiting to a minimum viable problem situation reduces the time it takes for one evaluation of the simulator to ~milliseconds. However, an advantage is that I do not have a large number of parameters I want to infer (< 10), and not a lot of data points for each call of the simulator (~200).
At the moment I am using Approximate Bayesian Computation with Sequential Monte Carlo in a way that is similar to what is described in this example of the PyMC
documentation. For the minimum viable problem, this works very well, I am getting the expected distribution for the posterior with a runtime of roughly 15 minutes on my laptop. Looking at plot_pair
shows that there is a non-trivial correlation structure between the parameters:
However trying this same approach for the realistic scenarios takes prohibitively long, because the simulator evaluation is so much more expensive.
I tried to tune the number of draws and epsilon
in pymc.Simulator()
to reduce the runtime while still giving a meaningful result, which already helped, but not enough. Naturally, I am also working on optimising the simulation function, however it is very likely impossible to reduce the runtime fundamentally.
Doing plain MCMC sampling with Slice doesn’t give the expected results even for the minimal version of the problem. Using it for the realistic scenario takes prohibitively long.
I also tried implementing the gradient of the model via a finite difference approach, and tried the NUTS sampler for MCMC as well as the variational inference methods implemented by PyMC (ADVI, FullRankADVI, SVGD). However all of these do not converge to the expected posterior even for the minimal viable model and are prohibitively expensive for the realistic simulator scenario as well.
I asked a similar question on StackExchange, where it was recommended to use Variational Sequential Monte Carlo, which however is not directly supported by PyMC and I could not get it to work.
What are approaches that I could try to deal with such an expensive simulation function? How could I optimise the number of times this simulation function needs to be called as much as possible? Or is there an entirely different approach, which is better suited to this issue? (Apologies if I am missing something obvious, I am quite new to Bayesian inference.)