Context: I have a custom rejection sampler that uses the model-building aspects of pymc (and MCMC in some cases) to define random variables and their relations. I’ve then been using pymc/pymc3 to do the prior sampling before the rejection step. I’m only now updating the code to be compatible with pymc v5 (it was built on pymc3 and has been stuck in the past for a while because of lack of time!).
With pymc3, I was using draw_values()
to do the prior sampling, which I now understand is just draw()
in pymc. However, I’ve noticed a big performance difference between the two: in typical use cases of the rejection sampling, I need to generate millions of samples. In pymc3, this took seconds, but in pymc it is taking many minutes.
From looking a little under the hood, my guess is that this is because draw()
always uses this list comprehension to do the sampling: pymc/pymc/sampling/forward.py at 244fb97b01ad0f3dadf5c3837b65839e2a59a0e8 · pymc-devs/pymc · GitHub
(so it scales linearly with the number of requested samples)
But back in the pymc3 draw_values()
, the sampling looks like it was vectorized, e.g.,
pymc/pymc3/distributions/distribution.py at bfc3813367592e62cc25be5abf0484d417972d84 · pymc-devs/pymc · GitHub
Is there any way to speed up pymc’s draw()
when generating a large number of samples?