Difference between using entire trace versus sampling from the trace first

Hello,

Is there a difference or benefit for using an entire trace from an MCMC versus taking samples from the trace first (using pm.sample_posterior_predictive) to make inferences?

I’m not sure I quite understand your question given the way it’s phrased, so please take this as a first attempt to clarify:

  1. The trace records the position of the sampler in the joint posterior parameter space at each accepted step during the MCMC sampling. The aim of MCMC is to sample from the region of this space that has the highest probability density, i.e. where the sampler is stable aka well mixed.
  2. Once the sampler is inside this region aka the traces are well mixed, it should be possible to run the sampling indefinitely longer without leaving this region (I’m sure there’s mathematical exceptions to this bold statement) so the ā€˜final’ sample is arbitrary and the length of the trace (draws) is your choice.
  3. The start point of the trace aka initialization is not arbitrary and can be chosen through a number of methods; the default for pm.sample is init=jitter+adapt_diag see here: Inference — PyMC3 3.11.2 documentation. The sampler has to explore from this initialisation towards the region of higher density, and these ā€˜tuning’ aka ā€˜burn-in’ samples are usually discarded and not used for anything downstream.
  4. I understand your statement ā€œto make inferencesā€ to mean viewing the posterior parameter distributions, like you might do using arviz.plot_posterior arviz.plot_posterior — ArviZ dev documentation - for this you need samples from the well mixed, stable region of the sampling, and you simply need a trace with enough samples to yield the precision you want to quote. E.g. a trace of length 1000 samples lets you quote a point on that distribution to 3 decimal places aka 0.001 aka 0.1%. You can set this using pm.sample(draws=1000 ....
  5. Using pm.sample_posterior_predictive is quite different and uses the traces to probabilistically generate new synthetic data. For this you can use a trace of any length, but in practice it makes very good sense to use the same trace (per #4 above) that you use for inference.
  6. pm.sample_posterior_predictive does give you the option to select a different number of samples from the trace: i.e. to undersample or even oversample a set of samples from the full trace. However per the documentation (see below), the pymc3 devs recommendation is to only set this number differently if you have specific reason, otherwise you should accept the default:
**samples** int

Number of posterior predictive samples to generate. Defaults to one posterior predictive sample per posterior sample, that is, the number of draws times the number of chains. It is not recommended to modify this value; when modified, some chains may not be represented in the posterior predictive sample.

Does this help?

1 Like