Silly question: 'zigzag' curves while making inference


I have a question for clarify my mind.

when we do inference on a model (pm.sample_posterior_predictive(trace)), we got a set of (standard value) 1000 arrays. Each array is built of a number of points.

My question: why the points building each array are not following a smooth trajectory?

I thought that each array is obtained by two steps:

  1. making a random sampling on the posterior distributions, to get parameters value;
  2. calculate the points following the formula given in the model, with the parameters extracted.

Or is, maybe, each point obtained by a random sampling, and arranged in the matrix only for organizations purposes?

Thank you

Hard to tell without an example, but I think I see what you’re asking: posterior predictive samples do not only sample from the posterior parameters; they also sample from the likelihood. This adds a layer of uncertainty, and explains why the samples are not following a “smooth trajectory”, as you say.

For a linear regression for instance, pm.sample_posterior_predictive doesn’t only sample from the mean, which is a combination of slope and intercept (mu = a + b*x) but it also samples from the likelihood of the data: pm.Normal(mu = a + b*x, sigma=sigma, observed=obs_data). You get “new data”, or retrodictions, as a results.

Is that clearer?

So the sampling is for every x on the likelihood. Got it, thank you!

Yeah, pm.sample_posterior_predictive basically samples the posterior parameters 500 times by default. So, if you had 10 data points, your posterior predictive samples should be of shape (500, 10).