Are samples from the trace equivalent to samples from pm.sample_posterior_predictive?

Hi PyMC3ers,

I’ve got a large number of models (one model fit on multiple datasets), and would rather not run pm.sample_posterior_predictive on each one in order to get samples.

Is it “OK” to just use the actual samples returned by pm.sample instead?

What I really want is the range of estimated parameters for a StudentT distribution, which I can then throw into a scipy.stats.t object to get to the ppf method. For example:

from scipy.stats import t 

trace = traces[15] #get the trace for the 15th model
mu = trace['mu'] #these are the values returned by pm.sample
nu = trace['nu'] #these are the values returned by pm.sample
scale = trace['sig'] #these are the values returned by pm.sample

studentT = t(nu,mu,sig) #get an array of t distributions parameterized by those samples above

#now find the ppf value for 0.4 for each of these t distributions:
ppf_values = studenT.ppf(0.4)

Thanks for your time!

2 Likes

Hi Lewis,
Yes, I don’t see any problem with this here – that’s basically what pm.sample_posterior_predictive does under the hood, and what you have to do when there is a bug (#ShapeIssue :wink:) in it with a given distribution.
Hope this helps :vulcan_salute:

3 Likes

A related question. Why do we actually predict using sample_posterior_predictive instead of just manually calculate using all the samples parameters?

My guess is that using sample_posterior_predictive actually does a better prediction on values it can see in the trace. That would explain why I get a narrower HDI from sample_posterior_predictive than from “manually” calculating the values using the trace.

Am I completely off? :slight_smile:

Great thnx alex
lew

Hey Mattias - the naming doesn’t quite agree but I guess one reason to use sample_posterior_predictive over taking the actual samples is in https://arxiv.org/pdf/1709.01449.pdf - sometimes you might define a model and, before doing any MCMC, draw some samples just from the priors to assess if they’re at least reasonable.

I thought that was what sample_prior_predictive was meant for :slight_smile:

1 Like

I’d say there are at least two reasons:

  1. For convenience: just using this simple one-liner instead of having to basically rebuild the whole model by operating on the posterior samples – I’m much happier when PyMC3 does it for me automatically :sweat_smile:
  2. Although there are no conceptual difference when using sample_posterior_predictive to sample latent parameters, there is one when you sample actual posterior predictive samples, aka predictions, aka new observations, aka new ys (this has a lot of names :grin:). Then, sample_posterior_predictive allows you to integrate over all the uncertainty in the model – the one from the latent parameters, the one from the likelihood, and the one from correlations betwen all those parameters.
2 Likes