Sorry about the late answer, hopefully it will still be useful.
The main motivation for this is making the defaults more sensible and nudging users (especially basic and average ones) towards best practices. In the vast majority of cases, one should draw one posterior predictive sample per posterior sample. Generating less samples means loosing information and generating more does not increase the precision. There are reasons to do both, but they should be done carefully.
There has been some v4 versions during which generating multiple posterior predictive samples per posterior draw wasn’t directly possible. But it is now possible again (in main only for now) using sample_dims
:
expanded_data = idata.posterior.expand_dims(pred_id=5)
with model:
idata.extend(pymc.sample_posterior_predictive(
expanded_data, sample_dims=["chain", "draw", "pred_id"]
))
v3 had a size
argument to generate multiple draws per posterior sample but it was removed in v4 because it generated more problems than solutions, had been broken for a while and nobody complained.
The use of samples
was also inconsistent between versions and might not be doing what users expect. Let’s take the case when samples
is smaller than n_chains * n_draws
. Should pymc get 1 every m draws selecting all chains? 1 every k sample (flattening chain and draw dimensions), getting the first draws/chains until we have samples
draws? or maybe using random subsets? with repetition or without repetition? And what about the applications that need to have the “pairing” of posterior predictive draw with the posterior draw that was used to generate it? v3 used at least two of these approaches.
In my opinion, the headaches caused by all of this, by samples
, size
and keep_size
are larger than moving this step to the user. Now to generate 1 posterior predictive draw every 5 posterior samples you can do
# store subsetted inferencedata
thinned_idata = idata.sel(draw=slice(None, None, 5))
with model:
idata.extend(pymc.sample_posterior_predictive(thinned_idata))
# do not store it
with model:
idata.extend(pymc.sample_posterior_predictive(
idata.sel(draw=slice(None, None, 5))
))
or to generate posterior predictive samples for a random subset of the posterior you can do:
post_subset = az.extract(idata, num_samples=100)
with model:
idata.extend(pymc.sample_posterior_predictive(
post_subset, sample_dims=["sample"]
))
We do force users to be a bit more explicit than before, but it is not prohibited. And hopefully it will result in the result being what is expected more often (or always).