Plot posterior predictive results scatterplot in specific format using arviz

matsuo_basho · July 9, 2023, 5:30pm

I’m doing problem 16H2 in McElreath’s Stat Rethinking 2 and want to do some plots to understand the posterior predictive samples across different individuals.

I’m not finding a tutorial on how to manipulate the InferenceData object to get what I want.
The panda_nuts underlying dataset can be downloaded here.

The dataset contains 84 nut-opening related records for 22 chimpanzees. We are modeling the number of nuts a given chimpanzee opened by modeling the rate of nuts opened and then using a Poisson distribution.

import warnings

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import pytensor.tensor as tens

panda_nuts = pd.read_csv('data/panda_nuts.txt', sep=';')

seconds = panda_nuts.seconds
indiv = panda_nuts.chimpanzee
age = panda_nuts.age / panda_nuts.age.max()

with pm.Model() as unpooled_indiv:
    phi = pm.LogNormal('phi', mu=np.log(1), sigma=0.1)  # proportionality constant (product of proportionality of mass
                                                        # to strength and prop of strength to nut-opening)
    k = pm.LogNormal('k', mu=np.log(2), sigma=0.25, shape=panda_nuts.chimpanzee.size)  # Rate of nut-opening skill gain with age
    theta = pm.LogNormal('theta', mu=np.log(5), sigma=0.25)  # Positive returns to strength
    
    secs_var = pm.Data('secs', seconds.values)
    age_var = pm.Data('age', age.values)
    
    lam = secs_var * phi * (1 - pm.math.exp(-k[indiv] * age_var))**theta
    n = pm.Poisson('n', lam, observed = panda_nuts.nuts_opened)
    
    res_complete_indiv = pm.sample()

with unpooled_indiv:
    res = pm.sample_posterior_predictive(res_complete_indiv)

The image I want to show:
Scatter-plot where x-axis is each of the 22 individual chimpanzees and y-axis contains the predicted number of nuts opened (lambda from above). I’d also like to show a horizontal bar for the average number of nuts opened for each individual (but this is optional if it’s not possible with Arviz).

Links to tutorials welcome, I’m just trying to avoid doing this with matplotlib because I think the syntax is unnecessarily complex and I understand Arviz is set up for exactly this type of task.

Just in case, I also have a formulation of the model using the dims functionality, not sure if it would make manipulating the arviz inference data object easier:

chimp_ind, chimp = panda_nuts.chimpanzee.factorize()
coords = {"indiv": chimp}

with pm.Model(coords = coords) as unpooled_indiv:
    phi = pm.LogNormal('phi', mu=np.log(1), sigma=0.1)  # proportionality constant (product of proportionality of mass
                                                        # to strength and prop of strength to nut-opening)
    k = pm.LogNormal('k', mu=np.log(2), sigma=0.25, dims="indiv")  # Rate of nut-opening skill gain with age
    theta = pm.LogNormal('theta', mu=np.log(5), sigma=0.25)  # Positive returns to strength
    
    secs_var = pm.Data('secs', seconds.values)
    age_var = pm.Data('age', age.values)
    
    lam = secs_var * phi * (1 - pm.math.exp(-k[chimp_ind] * age_var))**theta
    n = pm.Poisson('n', lam, observed = panda_nuts.nuts_opened)
    
    res_complete_indiv = pm.sample()

Simon · July 10, 2023, 8:23am

For your first example you can simply do (a tip @jessegrabowski gave a while ago):

posterior = idata.stack(sample = ['chain', 'draw']).posterior

Note that res_complete_indiv = idata (I generally prefer to use the name idata to follow PyMC
conventions). Then if you want to extract a parameter in numpy format you can use:

posterior['theta'].values

You can do the same with the posterior predictive:

preds = res.stack(sample = ['chain', 'draw']).posterior_predictive

Then plotting with plt is not really that cumbersome.

import matplotlib.pyplot as plt

chimps = panda_nuts.chimpanzee.values
pred_mean = preds['n'].values.mean(axis=1)

fig, ax = plt.subplots(1,1, figsize=(5,5))
ax.scatter(chimps, pred_mean, color='b', label="predicted means")
ax.set_xlabel("Chimp")
ax.set_ylabel("Nuts opened")
ax.legend()

As far as I know you cannot directly produce this type of scatter with Arviz. Hope I understood you question correctly and this helps.

jessegrabowski · July 10, 2023, 9:19am

There is a convenience function in arviz to do this as well: posterior = az.extract(idata)

OriolAbril · July 11, 2023, 7:32pm

You can also name the dimensions in the poisson and add its coordinates with something like:

coords = {"indiv": chimp, "indiv_id": chimp_ind}

...

n = pm.Poisson('n', lam, observed = panda_nuts.nuts_opened, dims="indiv_id")
    
    res_complete_indiv = pm.sample()
    pm.sample_posterior_predictive(res_complete_indiv, extend_inferencedata=True)

then plot directly with xarray:

pred_means = res_complete_indiv.posterior_predictive["n"].mean(["chain", "draw"])
pred_means.plot.scatter(x="indiv_id")

which will generate the following plot:

imatge

Topic		Replies	Views
How to structure model with different priors on values of a categorical variable	0	214	May 16, 2023
Visualizing highest posterior density for multiple conditions using arviz plot_hpd Questions	7	2906	November 8, 2021
How to retrieve information from inference data? v5 arviz	4	849	April 2, 2023
How to get the labels for the predictive distribution? version agnostic bambi	2	545	May 31, 2022
Bug in sample_prior_predictive Questions	6	720	November 14, 2018

Plot posterior predictive results scatterplot in specific format using arviz

Related Topics