Plot posterior predictive results scatterplot in specific format using arviz

I’m doing problem 16H2 in McElreath’s Stat Rethinking 2 and want to do some plots to understand the posterior predictive samples across different individuals.

I’m not finding a tutorial on how to manipulate the InferenceData object to get what I want.
The panda_nuts underlying dataset can be downloaded here.

The dataset contains 84 nut-opening related records for 22 chimpanzees. We are modeling the number of nuts a given chimpanzee opened by modeling the rate of nuts opened and then using a Poisson distribution.

import warnings

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import pytensor.tensor as tens

panda_nuts = pd.read_csv('data/panda_nuts.txt', sep=';')

seconds = panda_nuts.seconds
indiv = panda_nuts.chimpanzee
age = panda_nuts.age / panda_nuts.age.max()

with pm.Model() as unpooled_indiv:
    phi = pm.LogNormal('phi', mu=np.log(1), sigma=0.1)  # proportionality constant (product of proportionality of mass
                                                        # to strength and prop of strength to nut-opening)
    k = pm.LogNormal('k', mu=np.log(2), sigma=0.25, shape=panda_nuts.chimpanzee.size)  # Rate of nut-opening skill gain with age
    theta = pm.LogNormal('theta', mu=np.log(5), sigma=0.25)  # Positive returns to strength
    
    secs_var = pm.Data('secs', seconds.values)
    age_var = pm.Data('age', age.values)
    
    lam = secs_var * phi * (1 - pm.math.exp(-k[indiv] * age_var))**theta
    n = pm.Poisson('n', lam, observed = panda_nuts.nuts_opened)
    
    res_complete_indiv = pm.sample()

with unpooled_indiv:
    res = pm.sample_posterior_predictive(res_complete_indiv)

The image I want to show:
Scatter-plot where x-axis is each of the 22 individual chimpanzees and y-axis contains the predicted number of nuts opened (lambda from above). I’d also like to show a horizontal bar for the average number of nuts opened for each individual (but this is optional if it’s not possible with Arviz).

Links to tutorials welcome, I’m just trying to avoid doing this with matplotlib because I think the syntax is unnecessarily complex and I understand Arviz is set up for exactly this type of task.

Just in case, I also have a formulation of the model using the dims functionality, not sure if it would make manipulating the arviz inference data object easier:

chimp_ind, chimp = panda_nuts.chimpanzee.factorize()
coords = {"indiv": chimp}

with pm.Model(coords = coords) as unpooled_indiv:
    phi = pm.LogNormal('phi', mu=np.log(1), sigma=0.1)  # proportionality constant (product of proportionality of mass
                                                        # to strength and prop of strength to nut-opening)
    k = pm.LogNormal('k', mu=np.log(2), sigma=0.25, dims="indiv")  # Rate of nut-opening skill gain with age
    theta = pm.LogNormal('theta', mu=np.log(5), sigma=0.25)  # Positive returns to strength
    
    secs_var = pm.Data('secs', seconds.values)
    age_var = pm.Data('age', age.values)
    
    lam = secs_var * phi * (1 - pm.math.exp(-k[chimp_ind] * age_var))**theta
    n = pm.Poisson('n', lam, observed = panda_nuts.nuts_opened)
    
    res_complete_indiv = pm.sample()

For your first example you can simply do (a tip @jessegrabowski gave a while ago):

posterior = idata.stack(sample = ['chain', 'draw']).posterior

Note that res_complete_indiv = idata (I generally prefer to use the name idata to follow PyMC
conventions). Then if you want to extract a parameter in numpy format you can use:

posterior['theta'].values

You can do the same with the posterior predictive:

preds = res.stack(sample = ['chain', 'draw']).posterior_predictive

Then plotting with plt is not really that cumbersome.

import matplotlib.pyplot as plt

chimps = panda_nuts.chimpanzee.values
pred_mean = preds['n'].values.mean(axis=1)

fig, ax = plt.subplots(1,1, figsize=(5,5))
ax.scatter(chimps, pred_mean, color='b', label="predicted means")
ax.set_xlabel("Chimp")
ax.set_ylabel("Nuts opened")
ax.legend()

image

As far as I know you cannot directly produce this type of scatter with Arviz. Hope I understood you question correctly and this helps.

2 Likes

There is a convenience function in arviz to do this as well: posterior = az.extract(idata)

1 Like

You can also name the dimensions in the poisson and add its coordinates with something like:

coords = {"indiv": chimp, "indiv_id": chimp_ind}

...

n = pm.Poisson('n', lam, observed = panda_nuts.nuts_opened, dims="indiv_id")
    
    res_complete_indiv = pm.sample()
    pm.sample_posterior_predictive(res_complete_indiv, extend_inferencedata=True)

then plot directly with xarray:

pred_means = res_complete_indiv.posterior_predictive["n"].mean(["chain", "draw"])
pred_means.plot.scatter(x="indiv_id")

which will generate the following plot:

imatge

3 Likes