Sample_posterior_predictive

https://docs.pymc.io/notebooks/posterior_predictive.html

In these examples, the trace contains 2 chains, each of 5000 posterior samples.

Which chain (in the trace) does pm.sample_posterior_predictive use to generate 500 data sets? If the chains are very different (unlike this example where chains are very close) this makes a difference.

If the chains in the trace are very different. Is there any way to pass the trace of a particular chain to the sample_posterior_predictive function to generate data sets corresponding to the samples of the passed chain?

It goes through each chain one at a time. However, you are not forced to supply a MultiTrace instance, you can supply a list of point dictionaries. You can get those from the trace easily doing the following:

with pm.Model():
    ...
    trace = pm.sample()
    df = pm.trace_to_dataframe(trace,
                               varnames=[the variables you want],
                               include_transformed=True)
    # We have to supply the samples kwarg because it cannot be inferred if the
    # input trace is not a MultiTrace instance
    ppc = pm.sample_posterior_predictive(trace=df.to_dict('records'),
                                         samples=len(df))

As you can see. We first get a dataframe and convert it to records dict. At the dataframe level, you can do any indexing or chain manipulation you want or need before supplying it the sample_posterior_predictive.

A related question was asked here.

1 Like