Performance of draw() vs. pymc3's draw_values()

Context: I have a custom rejection sampler that uses the model-building aspects of pymc (and MCMC in some cases) to define random variables and their relations. I’ve then been using pymc/pymc3 to do the prior sampling before the rejection step. I’m only now updating the code to be compatible with pymc v5 (it was built on pymc3 and has been stuck in the past for a while because of lack of time!).

With pymc3, I was using draw_values() to do the prior sampling, which I now understand is just draw() in pymc. However, I’ve noticed a big performance difference between the two: in typical use cases of the rejection sampling, I need to generate millions of samples. In pymc3, this took seconds, but in pymc it is taking many minutes.

From looking a little under the hood, my guess is that this is because draw() always uses this list comprehension to do the sampling: pymc/pymc/sampling/forward.py at 244fb97b01ad0f3dadf5c3837b65839e2a59a0e8 · pymc-devs/pymc · GitHub
(so it scales linearly with the number of requested samples)
But back in the pymc3 draw_values(), the sampling looks like it was vectorized, e.g.,
pymc/pymc3/distributions/distribution.py at bfc3813367592e62cc25be5abf0484d417972d84 · pymc-devs/pymc · GitHub

Is there any way to speed up pymc’s draw() when generating a large number of samples?

1 Like

@ricardoV94 @lucianopaz

The biggest bottleneck with pm.draw is that it compiles the random function in every call. For performance you should compile the function once and keep it around. Something like

from pymc.pytensorf import compile_pymc

...

fn = compile_pymc([], rvs, random_seed=123)
fn()
fn()

For vectorization you may need to redefine the model to have batch dimensions on the left. You could use pytensor.graph.replace.vectorize_graph but it’s a bit tricky. We should add a utility for that.

2 Likes