A problem interpreting an error in PyMC

Good morning,

It’s been a long time since I last used PyMC and I realize that I have mostly forgotten how programs should be properly written!

To address a Gamma-Poisson problem that interests me, I was inspired by the question asked and addressed here: HERE

So, I adjusted the instructions given by jlindbloom a bit, based on what I remember, and figured I could write the program below:

import pymc as pm
import arviz as az

with pm.Model() as model:
    gamma = pm.Gamma('gamma', alpha=1, beta=1) # Prior
    p_obs = pm.Poisson('p_obs', mu=gamma, observed=[1]) # Likelihood

with model:
    idata = pm.sample(tune=1000,draws=5000,target_accept=0.9)
    idata_prior = pm.sample_prior_predictive(var_names=["gamma"],
                                          return_inferencedata=True)
    idata_posterior = pm.sample_posterior_predictive(idata, var_names=["p_obs"],
                                                     return_inferencedata=True,
                                                     extend_inferencedata=True,
                                                     predictions=False)

idata.extend(idata_prior)
idata.extend(idata_posterior)
    
az.plot_ppc(idata_prior, var_names=["p_obs"])      
az.plot_ppc(idata_posterior, var_names=["gamma"])

Unfortunately, what I feared happened, in the form of the error:

TypeError                                 Traceback (most recent call last)
Cell In[22], line 20
     16 idata.extend(idata_prior)
     17 idata.extend(idata_posterior)
---> 20 az.plot_ppc(idata_prior, var_names=["p_obs"])   
     22 az.plot_ppc(idata_posterior, var_names=["gamma"])
...
TypeError: `data` argument must have the group "posterior_predictive" for ppcplot.

and, obviously, I completely forgot what the meaning of: must have the group "posterior_predictive" for ppcplot is and how to fix that…

I come to you for some advice and ask you to bear with me on this not-so-smart question; I’ll try to find some notes I must have taken on the subject.

I think you can’t pass the prior dataset to az.plot_ppc, since it is supposed to work with posterior predictive datasets, and not priors

Nevermind, arviz docs says it works with prior_predictive as well. The error message does not reflect this. Try to call pm.sample_prior_predictive without setting var_names (or include p_obs as well) so that it also samples the observed variable. plot_ppc probably needs that one to not be empty

Thanks ricardoV94, but something still remains unclear ; please look:

arviz.plot_ppc contains (among other arguments):data, var_names and group.

  • data is an object containing the observed and posterior/prior predictive data
    I understand the sentence as “and posterior predictive OR prior predictive data”
  • var_names is optional (if None all variable are plotted)
  • group may be “prior” or “posterior”, and there is no mention of group “posterior_predictive”…

Running my program without the 2 lines calling to az.plot_ppc leads to:

idata:
• posterior
• posterior_predictive
• sample_stats
• prior
• observed_data
idata_prior:
• prior
• observed_data
idata_posterior:
• posterior
• posterior_predictive
• sample_stats
• prior
• observed_data

Now, running the 2 lines:

az.plot_ppc(idata_prior, var_names=None)      
az.plot_ppc(idata_posterior, var_names=None)

still leads to the same error, (apparently linked to the line: az.plot_ppc(idata_prior, var_names=None)):

TypeError                                 Traceback (most recent call last)
---> 20 az.plot_ppc(idata_prior, var_names=None)
TypeError: `data` argument must have the group "posterior_predictive" for ppcplot

But I don’t understand what this “posterior_predictive” group has to do with the idata_prior results. So, finally, what is clearly inaccurate / wrong in this program I made, what are the things I obviously forgot to include, and how could I correct it ?

Many thanks for any indication.

I meant var_names in the call to pm.sample_prior_predictive. Note that there is no prior_predictive group in your idata_prior (only a prior group), because you didn’t sample any observed variables in pm.sample_prior_predictive.

Hi ricardoV94, I’m sorry; I feel completely obtuse and can’t understand what’s going on.

I know I’m exaggerating by asking this, but could you do me a favor?

I tried each of these three lines:

    idata_prior = pm.sample_prior_predictive(var_names=["gamma", "p_obs"],
                                             return_inferencedata=True) 
    idata_prior = pm.sample_prior_predictive(var_names=None,
                                             return_inferencedata=True) 
    idata_prior = pm.sample_prior_predictive(return_inferencedata=True)

Each of them led to the same error (as already mentioned). Now, I am really stuck with the kind of instruction I should use…

Could you elaborate a bit more and give me more hints on what the right solution in order to get rid of this error?

I really don’t know how to move forward in this matter.

Thanks in advance