# Understanding shape of values returned by sample_posterior_predictive

With the following linear regression “toy” example I am trying to understand posterior predictive sampling:

``````k = 4
with pm.Model() as mdl:
sigma = pm.HalfFlat('sigma',shape=1)
beta = pm.Flat('beta',shape=(k,1))
mu = tt.dot(X, beta)
yobs = pm.Normal('yobs', mu=mu, sigma=sigma, observed=y)
trace = pm.sample(100, tune=500, target_accept=0.95, model=mdl)
post_pred = pm.sample_posterior_predictive(trace, model=mdl)
``````

y is a numpy vector with shape (41, 1)
X is a (41,4) numpy matrix

Why does

``````post_pred['yobs']
``````

have a shape of (400,41,1)? I was expecting a shape of (400,1): one new y for each of the 400 posterior values in trace. Can the second dimension (41) be set to some other value? In other words, how can sample_posterior_predictive() be used to sample predictions from N(X_new.dot(beta_PP), sigmaPP) (with X_new: new input matrix; beta_PP, sigma_PP: posterior samples)?

Hi Bastian,
Yeah, that’s the normal behavior: `pm.sample_posterior_predictive` will generate new samples for each of the 41 observations of your likelihood, and it does that for each of the 400 MCMC samples.

You have to change the value of `yobs` if you want true out-of-sample predictions. This notebook shows how to do that in PyMC3 – spoiler alert, you have to use the `Data` class Hope this helps 2 Likes

Thanks, Alex, it works!

For anyone interested, the above “toy” model has to be changed to something like this:

``````with pm.Model() as mdl:
sigma = pm.HalfFlat('sigma',shape=1)
beta = pm.Flat('beta',shape=(k,1))
X = pm.Data('X', X)
mu = tt.dot(X, beta)
yobs = pm.Normal('yobs', mu=mu, sigma=sigma, observed=y)
trace = pm.sample(1000, tune=2000, target_accept=0.95, model=mdl)
``````

Then out-of-sample predictions can be generated with

``````X_new = np.array([0,1,0,0]).reshape(1,4) # does not work without reshaping, X needs to be a row vector!
with mdl:
pm.set_data({'X': X_new }) # key line of code
pp = pm.sample_posterior_predictive(trace)
post_pred = pp['yobs'][:,0]
``````

`pp['yobs']` has a shape of (400,41), so the sampling for the one observation `X_new` seems to be repeated 41 times. I wonder why PyMC3 is doing that.

Hi Bastian,

The (400, 41) shape is a bug in PyMC3 (see this github issue for details), it should be (400, 1) when using `X_new`. As explained in the issue, if `X_new` were of shape (2,4) then the bug should not be triggered and the posterior predictive shape be (400, 2)

2 Likes