I am taking my baby steps with ABC-SMC, and I first made a pure numpy/python based simulator, not optimized or anything. However, when I tried to use it in PyMC (sample_smc) it was terribly slow. My simulator has a while True loop and perhaps some other non-idealities as well (a non-homogeneous Poisson process defining the number of samples as the simulation advances), but it runs ok with numpy. I tried writing the simulator using PyMC (following https://www.pymc-labs.com/blog-posts/simulating-data-with-pymc/ ), only to re-produce the incredibly slow behavior (0.48 seconds with numpy, 453 seconds with pymc simulating a dataset of 100 samples).

With my obvious lack of skills and unoptimized code for PyTensor/PyMC, I wondered whether it would be possible to run the simulator in the SMC-sampling scheme without any PyTensor conversions/C-compiling?

Thanks for the clarification. Could you please clarify which “technical advice”-section you are referring to?

Indeed, I tried writing the simulator without drawing in the loop but then I ran into the problem that I needed to draw/evaluate the values to be able to use boolean operators in the control logic.

How do you then explain the slowness of the sample_smc if it does not touch my numpy code?
Perhaps I should reduce the simulator to a mwe.

Edit: Oh yes I found the technical advice at the end of the example.

I don’t know how you are timing, but SMC-ABC needs to do many evaluations per iteration in order to update the proposal. It should certainly be orders of magnitude slower than running the simulator once.

Precompiling the random samplers with the pymc compile_pymc utility led to similar performance in the simulator as with numpy

I moved to linux and out of jupyter notebook to rule out issues with windows or notebook

Now with the working progress bar I noticed that one or more of the chains seems to get stuck

Tightening the priors around hyperparameter true values (with simulated observed data) and loosening epsilon for smc let me get my first real sample for this model

I had 100 draws (/stage?) with 8 chains (and 8 cores) in this computation, which took 25 minutes to complete. All of my cpu cores are being fully utilized. For a simple comparison, if we measure roughly 0.5 seconds per draw and, hypothetically, have 10 stages for the beta evolution of SMC (assuming each stage has 100 draws as well), it would make 0.5x100x10 = 500 seconds? Sure, as you mentioned, there are more things being calculated under the hood, but should it be this big discrepancy from my rough estimation?

Each stage runs a metropolis sampler multiple times, and there’s also the number of particles, which by default is 2000 per chain? You may want to reduce that if your model is converging well with less particles. The particles are defined by the “draws” argument.

Great, I am starting to get hang of it. Sorry for using this thread documenting my learning, but in the documentation the summary of the algorithm finally let me understand the parameters. So if I want to have more samples from the posterior I will indeed increase draws parameter but for debugging purposes if I want to get the annealing process done faster I should reduce the threshold parameter. SMC attempts to get an effective sample size of \mathrm{N}_\mathrm{eff}=draws\timesthreshold at each stage of the annealing.

Will I run into problems with SMC if I write the simulator using numba and gpu acceleration?

May need to disable multi core on sample_smc but it would be fun to try. You can compile the pymc simulator to jax with mode="JAX" when you call compile_pymc which gives you jitted gpu friendly code without any extra work.

In a few weeks you’ll also be able to compile to numba automatically, but only cpu. For gpu you have to write it yourself which doesn’t seem to be a problem for you anyway