How to change number of samples drawn by sample_posterior_predictive

Hi everybody! I’m working through Bayesian Modeling and Computation in Python book, and I ran into an issue trying to solve problem 1H23. I defined the model and ran it for five made-up observations (Y). I’d like to create a posterior predictive for 10 draws rather than 5. How should I go about it?

Y = [0, 1, 0, 0, 1]
with pm.Model() as model:
    # Specify the prior distribution of unknown parameter
    θ = pm.Beta("θ", 1, 1)

    # Specify the likelihood distribution and condition on the observed data
    obs = pm.Binomial("Obs", n=1, p=θ, observed=obs)
    trace = pm.sample(1000, return_inferencedata=False, model=model)
    posterior_checks = pm.sample_posterior_predictive(trace, samples=1, size=10)

Code above generates posterior predictive of five draws (one chain, ten samples):

posterior_checks[‘Obs’].shape
(1, 10, 5)

while I’d like to get the shape (1, 10, 10). More generally, how can I change the number of draws in similar models, if I’d like to create posterior predictive distributions for such scenarios?

Thanks for your help!!!

1 Like

It’s not entirely clear to me what you are trying to do here, so I apologize if my answer isn’t what you are looking for. In general, it’s ideal to pull an arbitrary number of samples from your trace and generate a credible set of observations for each sample. So if you want to see what your data could have looked like (conditional on your model, priors, and the data you used to sample in the first place), you can generate N credible data sets:

N = 100
with model:
    posterior_checks = pm.sample_posterior_predictive(trace, samples=N)
print(posterior_checks['Obs'].shape)
# (100, 5)

Unless I have strong reason to do otherwise, I typically generate as many credible data sets as I have samples because I don’t have to use all of them if I don’t want to (e.g., I can always use a random subset of these data sets). Plus posterior predictive sampling is fast:

# N is now implied by the number of chains/samples in trace
with model:
    posterior_checks = pm.sample_posterior_predictive(trace)

print(posterior_checks['Obs'].shape)
# (2000, 5)

If you need/want more credible data sets than you have samples in your trace, my suggestion is to sample for longer so that you have at least as many samples in your trace as you need credible data sets. If sampling is very expensive (i.e., your model and/or your data set is huge), it is possible to get additional credible data sets by using the size argument, but it’s not recommended.

Thanks Christian! Sorry I didn’t formulate my question well. What I’d like to do is to change the number of points in credible data sets generated by sample_posterior_predictive. In my example this number is always 5 (same as in the observed data Y), but I’d like to increase it to 10 (just as an example). In this particular case I should be able to “lump” two credible data sets with 5 observations each to get what I want, but that seems like a poor practice in a general case. Did I clarify it at all?

In that case, you might just take the samples from your trace and use them directly.

# how many credible data sets do we want?
n_datasets = 4

# how big is each?
n_obsperdataset = 10

# this is the number of 'flips' used in the model specification
n = 1

# grab some samples from the trace
thetas = np.random.choice(trace['θ'], size=n_datasets, replace=False)

# for each sample, generate a credible data set
for theta in thetas:
    print('new dataset:\t' + str(scipy.stats.binom(n, p=theta).rvs(size=n_obsperdataset)))

You’ll obviously want to do something more useful than just printing the data sets, but hopefully that helps.

I see - yes, that totally makes sense! Thank you!!