Sample_posterior_predictive slow

I have a hierarchical Bayesian model that’s relatively large, that I have trained.

Now I have been generating 4000 posterior predictive samples from that model and this is quite time consuming; surprisingly so:

10:27:09,322 to 10:27:40,984
10:27:43,600 to 10:28:17,868
10:28:20,529 to 10:28:55,941
10:28:58,618 to 10:29:31,605
10:29:34,236 to 10:30:09,253
10:30:11,892 to 10:30:46,837
10:30:49,377 to 10:31:22,395
10:31:24,959 to 10:32:01,240
10:32:03,889 to 10:32:36,758
10:32:39,356 to 10:33:11,837
10:33:14,330 to 10:33:48,645

About 30s per call to sample_posterior_predictive(), which is unacceptable for our application. Anyone know why this would be so slow?


I think more information would help - what is the observed part of your model?

Here is a picture of the model from which we are sampling:

The gray nodes at the top are predictors that come in the input, and the obs value at the bottom is what we are predicting.

Note that this sampling model is a subset of the trained model: each of these sub-models is a model of a logic gate (implemented in an engineered yeast cell). We do the posterior predictive separately for each gate, but we train the full model together (because there are growth conditions that we believe have similar effects across all of the logic gates).

The trained model is too big to draw: basically it has six copies of the above, with the regression parameters at the top right of this figure shared across the six copies.

Hmm, definatively sound like a bug somewhere, but for what is worth, the model and the forward sample logic you are doing (i.e., sampling on a subset of the model) sounds complicate enough that maybe implementing a forward sample function would be more straightforward.

After more discussion, it seems clear that there’s a general issue about sample_posterior_predictive – it is very cleanly implemented, but is not efficient, and needs some love. I’m working on a rewrite.


I’m also facing issues with it, with a very simple model it is very slow.

@perone – If you have a recent version of PyMC3 please try pm.fast_sample_posterior_predictive(). This is a vectorized version of pm.sample_posterior_predictive() and should be much faster.


@rpgoldman holy moly, I confess that I had to check the answer to believe it indeed did something. Thanks a lot, I own you a beer.


Glad that worked for you! I had a problem where I needed to do a lot of generalization from my fitted model, and the original implementation of sample_posterior_predictive was just too slow for me.

It was really built primarily for model critiquing, not inference, I believe, and so was not optimized.