Visualizing highest posterior density for multiple conditions using arviz plot_hpd

Hello,
I am trying to visualize simple linear regression with highest posterior density (hpd) for multiple groups. However, I have a problem to apply hpd. Whenever I ran this code, I am extracting the same posterior density for each condition. I would like to visualize posterior density that corresponds to it’s group. How can I plot hpd’s for each group with arviz.plot_hpd?

import pymc3 as pm
import numpy as np
import matplotlib.pyplot as plt
import arviz as az
import pandas as pd

# data

data = pd.read_csv('www_MCMC/MCMC/data.csv')
rsp = data['Mean Response'].values
rt = data['Mean Reaction Time'].values
idx = pd.Categorical(data['Structure'], categories=['No Background', 'Only Road', 'Only Dot Ground', 'Dot Terrain + Dot Ground', 'Space', 'Full Background']).codes
groups = len(np.unique(idx))

# model

with pm.Model() as rsp_rt:
        
    α = pm.Normal('α', mu=0, sd=10, shape=groups)
    β = pm.Normal('β', mu=0, sd=10, shape=groups)
    ϵ = pm.HalfCauchy('ϵ', 10)
    
    μ = pm.Deterministic('μ', α[idx] + β[idx] * rt)
    
    y_pred = pm.Normal('y_pred2', mu=μ, sd=ϵ, observed=rsp)
    
    trace_rsp_rt = pm.sample(cores=1) 
    
# graphics

_, ax_rsp_rt = plt.subplots(2, 3, figsize=(10, 5), sharex=True, sharey=True, constrained_layout=True)
ax_rsp_rt = np.ravel(ax_rsp_rt)

for i in range(groups):
    
    alpha = trace_rsp_rt['α'][:, i].mean()
    beta = trace_rsp_rt['β'][:, i].mean()
    
    ax_rsp_rt[i].plot(rt, alpha + beta * rt, c='k', label= f'rsp = {alpha:.2f} + {beta:.2f} * rt')
    az.plot_hpd(rt, trace_rsp_rt['μ'], credible_interval=0.98, color='k', ax=ax_rsp_rt[i])
    ax_rsp_rt[i].legend()
    plt.xlim(1.2, 1.8)
    plt.ylim(0.6, 1) 

1 Like

Hi,

There are a couple of small modifications to your code that should fix the problem, however, I’d recommend taking advantage of ArviZ and xarray as it is shown in this notebook. If you have any trouble with the notebook please ask here below, I hope the model will be similar enough and the explanations clear enough.

for i in range(groups):
    
    alpha = trace_rsp_rt['α'][:, i]
    beta = trace_rsp_rt['β'][:, i]
    mu = alpha + beta * rt  
    # there may be broadcasting issues requiring to use rt[None, :]
    # xarray would handle broadcasting automatically ass seen in the notebook
    
    ax_rsp_rt[i].plot(rt, mu.mean(), c='k', label= f'rsp = {alpha:.2f} + {beta:.2f} * rt')
    az.plot_hpd(rt, mu, credible_interval=0.98, color='k', ax=ax_rsp_rt[i])
    ax_rsp_rt[i].legend()
    # combining pyplot and object based commands can yield unexpected results
    ax.set_xlim(1.2, 1.8)  
    ax.set_ylim(0.6, 1) 

On a somewhat tangential topic, note that μ in trace_rsp_rt is the distribution of the mean, not the posterior predictive distribution which is what should be used for predicting and for posterior predictive checks (comparing the predictions of the model with observed data).

EDIT: The notebook is not yet available in https://docs.pymc.io/ but will be updated soon.

2 Likes

Thank you very much for the notebook and your comment. It helps me to understand practical and conceptual levels.

1 Like