Bug in fast sample posterior predictive?

As I understand it, PPC is helpful in 3 situations:

  1. Sample from the observed distributions to see whether the model conclusions actually resemble your data. To do this, a dataset as large as the original one is sampled using the same likelihood distribution and conditioning on the old inputs and the posterior parameters that were just estimated. This is the standard behavior of the pymc method. In the simplest linear model with gaussian likelihood, it’s in its essence a shortcut to:
ppc_ys = scipy.stats.norm(trace['intercept'] + old_xs * trace['slope'], trace['sigma']).rvs()
  1. Sample from the observed distribution, to see what predictions would be made given new inputs, but keeping the same posterior values unchanged (which may be unrealistic). This is a spin on the case above, where you change your input data, say the xs in a linear model, to see how the ys would look like given the other posterior parameters in the trace. It’s a shortcut to:
new_ys = scipy.stats.norm(trace['intercept'] + new_xs * trace['slope'], trace['sigma']).rvs()
  1. Sample a complete new unobserved distribution that depends on posterior parameters. Say you want to see what a new group in a hierarchical model would look like. This is similar to the example above. (I didn’t know pymc3 could do this until very recently!) In a simple normal hierarchical model, this is a shortcut for:
new_group = scipy.stats.norm(trace['group_mu'], trace['group_sigma']).rvs()

When is it not helpful (because it simply can’t, that’s why mcmc was used in the first place):

  1. See how the posterior of unobserved variables would change if you changed your data (including adding a new observation group in a hierarchical model)

  2. See how the posterior of unobserved variables would change if you changed the prior or posterior of other parameters (as in your example where you changed the mean of a group in a hierarchical model).

2 Likes