Are samples from the trace equivalent to samples from pm.sample_posterior_predictive?

lewiso1 · October 15, 2020, 6:30am

Hi PyMC3ers,

I’ve got a large number of models (one model fit on multiple datasets), and would rather not run pm.sample_posterior_predictive on each one in order to get samples.

Is it “OK” to just use the actual samples returned by pm.sample instead?

What I really want is the range of estimated parameters for a StudentT distribution, which I can then throw into a scipy.stats.t object to get to the ppf method. For example:

from scipy.stats import t 

trace = traces[15] #get the trace for the 15th model
mu = trace['mu'] #these are the values returned by pm.sample
nu = trace['nu'] #these are the values returned by pm.sample
scale = trace['sig'] #these are the values returned by pm.sample

studentT = t(nu,mu,sig) #get an array of t distributions parameterized by those samples above

#now find the ppf value for 0.4 for each of these t distributions:
ppf_values = studenT.ppf(0.4)

Thanks for your time!

AlexAndorra · October 15, 2020, 10:34am

Hi Lewis,
Yes, I don’t see any problem with this here – that’s basically what pm.sample_posterior_predictive does under the hood, and what you have to do when there is a bug (#ShapeIssue ) in it with a given distribution.
Hope this helps

mattiasthalen · October 15, 2020, 7:35pm

A related question. Why do we actually predict using sample_posterior_predictive instead of just manually calculate using all the samples parameters?

My guess is that using sample_posterior_predictive actually does a better prediction on values it can see in the trace. That would explain why I get a narrower HDI from sample_posterior_predictive than from “manually” calculating the values using the trace.

Am I completely off?

lewiso1 · October 16, 2020, 2:51am

Great thnx alex
lew

lewiso1 · October 16, 2020, 2:57am

Hey Mattias - the naming doesn’t quite agree but I guess one reason to use sample_posterior_predictive over taking the actual samples is in https://arxiv.org/pdf/1709.01449.pdf - sometimes you might define a model and, before doing any MCMC, draw some samples just from the priors to assess if they’re at least reasonable.

mattiasthalen · October 16, 2020, 3:36am

I thought that was what sample_prior_predictive was meant for

AlexAndorra · October 16, 2020, 1:39pm

I’d say there are at least two reasons:

For convenience: just using this simple one-liner instead of having to basically rebuild the whole model by operating on the posterior samples – I’m much happier when PyMC3 does it for me automatically
Although there are no conceptual difference when using sample_posterior_predictive to sample latent parameters, there is one when you sample actual posterior predictive samples, aka predictions, aka new observations, aka new ys (this has a lot of names ). Then, sample_posterior_predictive allows you to integrate over all the uncertainty in the model – the one from the latent parameters, the one from the likelihood, and the one from correlations betwen all those parameters.

Topic		Replies	Views
Pm.sample_posterior_predictive worse than results from forward models v3 modeling	0	436	April 15, 2023
What's the purpose of the "predictions" argument in the pymc.sample_posterior_predictive function? v5 modeling	1	420	March 22, 2023
Sample_posterior_predictive dependent variable vs deterministic dependent variable Questions	4	1093	February 11, 2020
Posterior Predictive Checks Questions	1	605	May 19, 2019
About a function of Pymc3 for Posterior prediction Questions	2	577	March 24, 2021

Are samples from the trace equivalent to samples from pm.sample_posterior_predictive?

Related topics