Posterior Predictive Sampling -- Works for Hierarchical Model?

In the code for sample_prior_predictive, if I understand it correctly, PyMC3 picks points from the trace, and then collects samples from the output variables (or, more generally, the variables the caller specifies), given the values of the variables in the point.
There’s a notebook that shows how to use this with a regression model, which is able to generalize (predict values for unseen combinations of predictors).
Again, IIUC, this works because of the following dependency structure:
image
The key feature here is that the regression weights are independent of the predictors. So we can sample from the posterior predictive by pulling a point (the variables in the upper right of the above figure), and from them, the independent predictors, and the deterministic formulas, we can generate random observations.
But what happens if we have a hierarchical model that has a structure with the following shape:
image
For example, maybe there’s a categorical variable, representing cluster membership, that is sampled with a conditional probability conditioned on the predictor variables. In this case, the latent value in the center is dependent upon the predictor values, and should be chosen from the posterior based on the predictor values, rather than plucked from the trace independent of the predictors.
In this case, won’t sample_posterior_predictive() generate incorrect samples? Or have I misread something?
If I am right, then PyMC3 cannot generate posterior predictions of unseen predictor combinations in cases like the hierarchical model. PyMC3 can only generate posterior predictions correctly if the parameters are all independent of the predictors.

5 Likes

Yes, you’re right. In this case the samples will not match what should be expected given the conditional dependency between the predictors and latent variables.

This doesn’t mean the hierarchical regression models usually used in pymc3 are wrong. The model structure is something like

\mu_{0} \sim Normal(...) \\ \beta_{i} \sim Normal(mu,...) \\ y \sim Normal(X.\vec{\beta},...)

So the parameters are all independent from the predictors.

Then in gaussian mixture models, the latent variables that represent class membership are marginalized out, so again no problems with posterior predictive sampling.

In the cases where you do collect samples from the latent variable, the solution to perform correct out of sample predictions is to not supply the entire trace to sample_posterior_predictive. You can pass a list of point dictionaries instead of a trace, and you just need to drop some keys from said dictionary (namely the latent variables and the nodes that conditionally depend on them). This is easily done converting the trace to a pandas dataframe:

    trace = pm.sample()
    df = pm.trace_to_dataframe(trace,
                               varnames=[the variables you want],
                               include_transformed=True)
    # We have to supply the samples kwarg because it cannot be inferred if the
    # input trace is not a MultiTrace instance
    ppc = pm.sample_posterior_predictive(trace=df.to_dict('records'),
                                         samples=len(df))

Thanks. That’s very helpful. It does mean, though, that the notebook about posterior predictive sampling avoids the space where things are tricky, so users might try to do things with sample_posterior_predictive() that will quietly yield bad results.
I wish I knew how to formulate an explanation of this process to put in the documentation.

I’m really short on time. I still need to write down a notebook on shape handling and shape_utils, but I’ll try to push some things onto the posterior predictive sampling notebook when I get the chance.

There are some threads here that talk about posterior predictive sampling that can serve as material for the notebooks