https://docs.pymc.io/notebooks/posterior_predictive.html
In these examples, the trace contains 2 chains, each of 5000 posterior samples.
Which chain (in the trace) does pm.sample_posterior_predictive use to generate 500 data sets? If the chains are very different (unlike this example where chains are very close) this makes a difference.
If the chains in the trace are very different. Is there any way to pass the trace of a particular chain to the sample_posterior_predictive function to generate data sets corresponding to the samples of the passed chain?
It goes through each chain one at a time. However, you are not forced to supply a MultiTrace
instance, you can supply a list of point dictionaries. You can get those from the trace easily doing the following:
with pm.Model():
...
trace = pm.sample()
df = pm.trace_to_dataframe(trace,
varnames=[the variables you want],
include_transformed=True)
# We have to supply the samples kwarg because it cannot be inferred if the
# input trace is not a MultiTrace instance
ppc = pm.sample_posterior_predictive(trace=df.to_dict('records'),
samples=len(df))
As you can see. We first get a dataframe and convert it to records dict. At the dataframe level, you can do any indexing or chain manipulation you want or need before supplying it the sample_posterior_predictive
.
A related question was asked here.
1 Like