I’m a little confused because this question is not of great scientific interest and rather relates to a lack of knowledge on my part.
I am trying to understand how to use a posterior predictive check (PPC
) after building a bayesian model using the PyMC
library. I’ve never done this before and I don’t quite understand how I should proceed…
Concretely, I would like to be able to generate a large number of samples from the posterior predictive distribution and compare them to the sample I used to build the model, using a simple criterion: the maximum value of the samples. I refer for this to Ben Lambert’s presentation video (What is a posterior predictive check and why is it useful? - YouTube) (see at 8:15) where he presents the following graph, which perfectly expresses what I would like to get:
So I built a simple model, where I gave myself a sample of 20 values that are supposed to come from a normal distribution with mean mu
and standard deviation sigma
that I am trying to estimate. (note that I gave myself a mu
and a sigma
to generate the sample which is the basis of the model).
I then built the code referring to various examples and included the sample_prior_predictive
and sample_posterior_predictive
instructions according to the information given by the API — PyMC 5.1.1 documentation
(API — PyMC 5.1.2 documentation) in the “samplers”
paragraph The code works (on my computer) and does not generate errors.
On the other hand, I am looking but I cannot find how to concretely use the posterior predictive distribution from my program and generate the histogram of the maximum values of the samples drawn from this distribution.
Well, I am training myself in Bayesian analysis and have a hard time manipulating and using PyMC outputs which I am not yet familiar with; If you could give me some pointers…
The code used is:
import numpy as np
import pymc as pm
import arviz as az
np.random.seed(63123)
data = np.random.normal(loc = 600, scale = 30, size = 20)
with pm.Model() as model:
# Priors for unknown model parameters
mu = pm.Uniform('mu', lower = 0, upper = 2000)
sigma = pm.Uniform('sigma', lower = 0, upper = 100)
# Likelihood of observations
lkd = pm.Normal('likelihood', mu = mu, sigma = sigma, observed = data)
# Expected value of outcome (WHAT IS THE USE ???)
weights = pm.Normal('weights', mu = mu, sigma = sigma)
# Inference data
idata = pm.sample()
# Predictives
pm.sample_prior_predictive(samples=500, var_names=None, random_seed=None,
return_inferencedata=True,
idata_kwargs=None, compile_kwargs=None)
pm.sample_posterior_predictive(idata,
var_names=None, sample_dims=None, random_seed=None,
progressbar=True, return_inferencedata=True,
extend_inferencedata=True, predictions=True,
idata_kwargs=None, compile_kwargs=None)