I’m doing problem 16H2 in McElreath’s Stat Rethinking 2 and want to do some plots to understand the posterior predictive samples across different individuals.
I’m not finding a tutorial on how to manipulate the InferenceData object to get what I want.
The panda_nuts underlying dataset can be downloaded here.
The dataset contains 84 nut-opening related records for 22 chimpanzees. We are modeling the number of nuts a given chimpanzee opened by modeling the rate of nuts opened and then using a Poisson distribution.
import warnings
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import pytensor.tensor as tens
panda_nuts = pd.read_csv('data/panda_nuts.txt', sep=';')
seconds = panda_nuts.seconds
indiv = panda_nuts.chimpanzee
age = panda_nuts.age / panda_nuts.age.max()
with pm.Model() as unpooled_indiv:
phi = pm.LogNormal('phi', mu=np.log(1), sigma=0.1) # proportionality constant (product of proportionality of mass
# to strength and prop of strength to nut-opening)
k = pm.LogNormal('k', mu=np.log(2), sigma=0.25, shape=panda_nuts.chimpanzee.size) # Rate of nut-opening skill gain with age
theta = pm.LogNormal('theta', mu=np.log(5), sigma=0.25) # Positive returns to strength
secs_var = pm.Data('secs', seconds.values)
age_var = pm.Data('age', age.values)
lam = secs_var * phi * (1 - pm.math.exp(-k[indiv] * age_var))**theta
n = pm.Poisson('n', lam, observed = panda_nuts.nuts_opened)
res_complete_indiv = pm.sample()
with unpooled_indiv:
res = pm.sample_posterior_predictive(res_complete_indiv)
The image I want to show:
Scatter-plot where x-axis is each of the 22 individual chimpanzees and y-axis contains the predicted number of nuts opened (lambda from above). I’d also like to show a horizontal bar for the average number of nuts opened for each individual (but this is optional if it’s not possible with Arviz).
Links to tutorials welcome, I’m just trying to avoid doing this with matplotlib because I think the syntax is unnecessarily complex and I understand Arviz is set up for exactly this type of task.
Just in case, I also have a formulation of the model using the dims functionality, not sure if it would make manipulating the arviz inference data object easier:
chimp_ind, chimp = panda_nuts.chimpanzee.factorize()
coords = {"indiv": chimp}
with pm.Model(coords = coords) as unpooled_indiv:
phi = pm.LogNormal('phi', mu=np.log(1), sigma=0.1) # proportionality constant (product of proportionality of mass
# to strength and prop of strength to nut-opening)
k = pm.LogNormal('k', mu=np.log(2), sigma=0.25, dims="indiv") # Rate of nut-opening skill gain with age
theta = pm.LogNormal('theta', mu=np.log(5), sigma=0.25) # Positive returns to strength
secs_var = pm.Data('secs', seconds.values)
age_var = pm.Data('age', age.values)
lam = secs_var * phi * (1 - pm.math.exp(-k[chimp_ind] * age_var))**theta
n = pm.Poisson('n', lam, observed = panda_nuts.nuts_opened)
res_complete_indiv = pm.sample()