InferenceData incomplete

Using the very simple model showed below, how is it that InferenceData contains no prior_predictive group? It would be highly useful if someone could explain my misunderstanding(s) of this modeling?

import pymc as pm

with pm.Model() as model:
p = pm.Beta(“p”, alpha=1, beta=1)
y_obs = pm.Binomial(“y_obs”, p=p, n=10, observed=7)
idata = pm.sample(1000)
pm.sample_prior_predictive(return_inferencedata=True)
pm.sample_posterior_predictive(idata,
return_inferencedata=True,
extend_inferencedata=True,
predictions=False)
idata

NOTE: in a previous post, thanks to OriolAbril, I had some guidance saying I now see you are not saving the results of prior predictive sampling anywhere. Potentially useful references: PyMC 4.0 with labeled coords and dims — Oriol unraveled 1, Prior and Posterior Predictive Checks — PyMC 5.1.2 documentation

Unfortunately, I can’t find my way around these documents: I can’t manage to use them to reformulate my program effectively… There’s something I don’t understand but can’t explain. Any suggestion welcome.

Try idata.extend(pm.sample_prior_predictive()). arviz.InferenceData.extend — ArviZ 0.14.0 documentation

1 Like

Nice suggestion! Actually the prior_predictive group is now included in idata. But shouldn’t it be simpler if extend_inferencedata=True could be added to pm.sample_prior_predictive(return_inferencedata=True) ? Having for example the possibility to write pm.sample_prior_predictive(return_inferencedata=True, extend_inferencedata=True) ? For the moment, it generates an error, but it is not very logical, right?

1 Like

I had a similar problem when using pm.to_inference_data(). If trace is used as an argument is already an InferenceData object then trace is simply returned and any other arguments are ignored (see the source code). I feel like the least surprising result would be for all arguments to be used. Maybe this would require extra arguments to control how the various fields are merged.

It is because you’re not passing in idata so that function does not have access. I usually just use idata.extend() for everything, also pm.sample_posterior_predictive().

Hi twiecki, I understand that it’s necessary to pass the information to idata somehow, but the documentation on pymc.sample_prior_predictive (HERE) gives, as appropriate writing:

pymc.sample_prior_predictive(samples=500, model=None, var_names=None, random_seed=None, return_inferencedata=True, idata_kwargs=None, compile_kwargs=None)

Taking into account this type of writing, I don’t really see what you have in mind when you say: " It is because you’re not passing in idata".

What do you think would be the way to pass in idata by using pymc.sample_prior_predictive, if one doesn’t use idata.extend()?

Currently there isn’t a way, but it’s a valid feature request. You could open an issue (feature request) on github.

@Andre where do you see in the docstrings that it accepts an InferenceData as input? It accepts the arguments used to specify the returned InferenceData, it doesn’t say anywhere you can pass one as input.

Or do you mean something else?

Hi ricardoV94, “…that it accepts…”, I guess by “it”, you mean pymc.sample_prior_predictive, true?

Well, yes, it doesn’t accept, instead of pymc.sample_posterior_predictive, which accepts, if I understand well : pymc.sample_posterior_predictive(trace,...), true?

But maybe I didn’t quite understand your question?

Yes. I got the impression from your message above that you were saying sample_prior_predictive should already accept InferenceData as input because of the documentation. But I think we are on the same page?

Just to be sure, the page I was referring to is THIS ONE

Okay, so to clarify posterior_predictive doesn’t simply accept InferenceData, it needs it! That’s why it’s an argument to it. prior_predictive has no intrinsic need for an InferenceData, it creates new samples from nothing. That’s why it’s not an input.

Is there something that does not make sense?

Ok, now it makes sense. I realize that my questions were actually about something I had misunderstood.

prior is something that obviously predates the data. So, it has obviously “no intrinsic need for an InferenceData”, as you rightly said, as “inferred” means “concluded from”…

Much clearer. Now, I have to learn how to properly manipulate the group idata.prior_predictive in order to check how credible prior assumptions are…

1 Like