Number of samples generated by `sample_posterior_predictive`

I have question about the number of samples generated by sample_posterior_predictive.
Suppose I have fitted my model and obtained a posterior trace with 4 chains with 1000 samples each (= 4000 samples from the posterior in total). Now, I would like to make posterior predictions and compare them to my observations. Specifically, I’d like to create a plot as used in the Principled Bayesian Workflow by Betancourt: https://betanalpha.github.io/assets/case_studies/principled_bayesian_workflow.html#step_fourteen:_posterior_retrodictive_checks65

To do that, I need to create multiple samples for my observed variable for each of the posterior samples. For instance, I’d like to create 100 samples for each of the 4000 samples from the posterior.

Now I’m a bit puzzled by the samples parameter of the sample_posterior_predictive function:

  1. The documentation says that “It is not recommended to modify this value; when modified, some chains may not be represented in the posterior predictive sample.”. I don’t understand that statement. To me, it seems straightforward to say that I’d like to sample e.g. 100 samples for each posterior sample. Why do we risk not using some chains?
  2. When I e.g. set samples=8000, I am getting a posterior prediction of shape (8000,). However, I would like to get shape of (4000, 2), indicating that in this case I took 2 samples of each posterior sample. Ideally, I would expect the function to except something like samples_per_posterior_sample=2.

Am I misunderstanding something here?

1 Like

You can pass a size argument to sample_posterior_predictive(). This allows you to specify:

The number of random draws from the distribution specified by the parameters in each sample of the trace. Not recommended unless more than ndraws times nchains posterior predictive samples are needed.

API reference is here.

3 Likes

Oh, I should have looked a bit further in the documentation. Thanks a lot for the quick help.

Hi all,

I have one doubt regarding 'sample_posterior_predictive function. Actually I wish to compute the weights(probability) of predicted samples. So I used the function
“pymc3.sampling.sample_posterior_predictive_w(trace)”.
But when I’m trying to run this ,an error occured.Its like

AttributeError Traceback (most recent call last)
in
----> 1 ppc_w = pm.sample_posterior_predictive_w(trace, 1000, model_particle,progressbar=False)

~/.local/lib/python3.6/site-packages/pymc3/sampling.py in sample_posterior_predictive_w(traces, samples, models, weights, random_seed, progressbar)
1786 traces = [dataset_to_point_dict(trace) for trace in traces]
1787 else:
-> 1788 n_samples = [len(i) * i.nchains for i in traces]
1789
1790 if models is None:

~/.local/lib/python3.6/site-packages/pymc3/sampling.py in (.0)
1786 traces = [dataset_to_point_dict(trace) for trace in traces]
1787 else:
-> 1788 n_samples = [len(i) * i.nchains for i in traces]
1789
1790 if models is None:

AttributeError: ‘dict’ object has no attribute ‘nchains’

Can anybody help me to resolve this error.Thanks in advance