Performance of draw() vs. pymc3's draw_values()

adrn · March 5, 2024, 2:56pm

Context: I have a custom rejection sampler that uses the model-building aspects of pymc (and MCMC in some cases) to define random variables and their relations. I’ve then been using pymc/pymc3 to do the prior sampling before the rejection step. I’m only now updating the code to be compatible with pymc v5 (it was built on pymc3 and has been stuck in the past for a while because of lack of time!).

With pymc3, I was using draw_values() to do the prior sampling, which I now understand is just draw() in pymc. However, I’ve noticed a big performance difference between the two: in typical use cases of the rejection sampling, I need to generate millions of samples. In pymc3, this took seconds, but in pymc it is taking many minutes.

From looking a little under the hood, my guess is that this is because draw() always uses this list comprehension to do the sampling: pymc/pymc/sampling/forward.py at 244fb97b01ad0f3dadf5c3837b65839e2a59a0e8 · pymc-devs/pymc · GitHub
(so it scales linearly with the number of requested samples)
But back in the pymc3 draw_values(), the sampling looks like it was vectorized, e.g.,
pymc/pymc3/distributions/distribution.py at bfc3813367592e62cc25be5abf0484d417972d84 · pymc-devs/pymc · GitHub

Is there any way to speed up pymc’s draw() when generating a large number of samples?

cluhmann · March 5, 2024, 3:19pm

@ricardoV94 @lucianopaz

ricardoV94 · March 5, 2024, 4:49pm

The biggest bottleneck with pm.draw is that it compiles the random function in every call. For performance you should compile the function once and keep it around. Something like

from pymc.pytensorf import compile_pymc

...

fn = compile_pymc([], rvs, random_seed=123)
fn()
fn()

For vectorization you may need to redefine the model to have batch dimensions on the left. You could use pytensor.graph.replace.vectorize_graph but it’s a bit tricky. We should add a utility for that.

Topic		Replies	Views
Pymc3 sampling processes Questions	5	511	December 18, 2020
Draw from prior functionality Development	0	467	January 2, 2020
Timeout on pymc3.sampling.sample Questions	1	885	November 13, 2019
Poor Performance of pyMC5 vs pyMC3 for large number of variables Development	10	2103	January 25, 2023
Draw_values() speed/scaling with transformed variables Questions	9	1967	November 7, 2019

Performance of draw() vs. pymc3's draw_values()

Related topics