Sample_posterior_predictive slow

rpgoldman · July 15, 2019, 3:39pm

I have a hierarchical Bayesian model that’s relatively large, that I have trained.

Now I have been generating 4000 posterior predictive samples from that model and this is quite time consuming; surprisingly so:

10:27:09,322 to 10:27:40,984
10:27:43,600 to 10:28:17,868
10:28:20,529 to 10:28:55,941
10:28:58,618 to 10:29:31,605
10:29:34,236 to 10:30:09,253
10:30:11,892 to 10:30:46,837
10:30:49,377 to 10:31:22,395
10:31:24,959 to 10:32:01,240
10:32:03,889 to 10:32:36,758
10:32:39,356 to 10:33:11,837
10:33:14,330 to 10:33:48,645

About 30s per call to sample_posterior_predictive(), which is unacceptable for our application. Anyone know why this would be so slow?

Thanks!

junpenglao · July 15, 2019, 4:04pm

I think more information would help - what is the observed part of your model?

rpgoldman · July 15, 2019, 4:16pm

Here is a picture of the model from which we are sampling:

The gray nodes at the top are predictors that come in the input, and the obs value at the bottom is what we are predicting.

Note that this sampling model is a subset of the trained model: each of these sub-models is a model of a logic gate (implemented in an engineered yeast cell). We do the posterior predictive separately for each gate, but we train the full model together (because there are growth conditions that we believe have similar effects across all of the logic gates).

The trained model is too big to draw: basically it has six copies of the above, with the regression parameters at the top right of this figure shared across the six copies.

junpenglao · July 15, 2019, 5:28pm

Hmm, definatively sound like a bug somewhere, but for what is worth, the model and the forward sample logic you are doing (i.e., sampling on a subset of the model) sounds complicate enough that maybe implementing a forward sample function would be more straightforward.

rpgoldman · July 16, 2019, 11:30pm

After more discussion, it seems clear that there’s a general issue about sample_posterior_predictive – it is very cleanly implemented, but is not efficient, and needs some love. I’m working on a rewrite.

perone · April 17, 2020, 7:53pm

I’m also facing issues with it, with a very simple model it is very slow.

rpgoldman · April 17, 2020, 8:04pm

@perone – If you have a recent version of PyMC3 please try pm.fast_sample_posterior_predictive(). This is a vectorized version of pm.sample_posterior_predictive() and should be much faster.

perone · April 17, 2020, 8:46pm

@rpgoldman holy moly, I confess that I had to check the answer to believe it indeed did something. Thanks a lot, I own you a beer.

rpgoldman · April 17, 2020, 10:21pm

Glad that worked for you! I had a problem where I needed to do a lot of generalization from my fitted model, and the original implementation of sample_posterior_predictive was just too slow for me.

It was really built primarily for model critiquing, not inference, I believe, and so was not optimized.

Topic		Replies	Views
Sample_posterior_predcictive() much slower than sample() version agnostic	2	318	May 11, 2023
Posterior predictive sampling of multivariate model takes long v5 modeling	1	375	December 31, 2022
Run time of sample_prior_predictive Questions	13	1036	November 19, 2018
MvNormal Normal - Fast Sampling, Slow Prediction v5 modeling	0	40	November 6, 2024
Trace from Pymc3 being used in Pymc 4.0 v5	11	912	June 10, 2022

Sample_posterior_predictive slow

Related topics