Mixed models and pm.LKJCholeskyCov

I’m trying to understand how to use correlated priors for the varying intercepts and slopes in mixed models because I would like to incorporate that into Bambi. I’m using the famous sleepstudy dataset. This post aims mainly to check whether what I’m doing is correct and/or check if there are better alternatives.

First of all, I fit the model using independent priors.

# Libraries
import arviz as az
import pandas as pd
import pymc3 as pm
import theano.tensor as tt

from formulae import design_matrices

# Read data
data = pd.read_csv(
    "https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/lme4/sleepstudy.csv",
    index_col = 0
)

# Get design matrices for common and group-specific parts
dm = design_matrices("Reaction ~ Days + (Days|Subject)", data)
X = dm.common.design_matrix
Z_0 = dm.group["1|Subject"]
Z_slope = dm.group["Days|Subject"]

# Just the mean of the prior for the intercept 
β_0_mean = data["Reaction"].mean()

# Model 1:Independent priors for varying intercept and slopes

with pm.Model() as model:
    # Common part
    β_0 = pm.HalfNormal("β_0", sigma=100)
    β_days = pm.Normal("β_days", mu=0, sigma=100)
    β = tt.stack(β_0, β_days)
    
    # Group-specific part, with non-centered parametrization
    # Intercept
    b_0_σ = pm.HalfNormal("b_0_σ", sigma=100)
    b_0_offset = pm.Normal("b_0_offset", mu=0, sigma=1, shape=Z_0.shape[1])
    b_0 = pm.Deterministic("b_0", b_0_offset * b_0_σ)
    
    # Slope
    b_days_σ = pm.HalfNormal("b_days_σ", sigma=100)
    b_days_offset = pm.Normal("b_days_offset", mu=0, sigma=1, shape=Z_slope.shape[1])
    b_days = pm.Deterministic("b_days", b_days_offset * b_days_σ)
    
    # Conditional mean
    μ = pm.math.dot(X, β) + pm.math.dot(Z_0, b_0) + pm.math.dot(Z_slope, b_days)
    
    σ = pm.HalfNormal("σ", sigma=100)
    y = pm.Normal('y', mu=μ, sigma=σ, observed=data.Reaction)
    
    idata = pm.sample(return_inferencedata=True)

And everything works great.


But here’s the thing when I go to model 2


coords = {
    "Subject": data["Subject"].unique(),
    "Effect": ["Intercept", "Slope"]
}
with pm.Model(coords=coords) as model_2:
    # Common part
    β_0 = pm.HalfNormal("β_0", sigma=100)
    β_days = pm.Normal("β_days", mu=0, sigma=100)
    β = tt.stack(β_0, β_days)
    
    # Group-specific part
    b_σ = pm.HalfNormal.dist(sigma=100, dims="Effect")
    chol, _, _ = pm.LKJCholeskyCov('chol', n=2, eta=1, sd_dist=b_σ, compute_corr=True)
    b_raw = pm.Normal('b_raw', mu=0, sigma=1, dims=("Subject", "Effect"))
    b = pm.Deterministic('b', tt.dot(chol, b_raw.T).T, dims=("Subject", "Effect"))    
    
    mu = pm.math.dot(X, β) + pm.math.dot(Z_0, b[:, 0]) + pm.math.dot(Z_slope, b[:, 1])
    
    σ = pm.HalfNormal("σ", sigma=100)
    y = pm.Normal('y', mu=mu, sigma=σ, observed=data.Reaction)
    
    idata = pm.sample(return_inferencedata=True)

It works, but slower, and sometimes I get divergences… I was expecting the opposite.

  • Anyone knows if my approach using the LKJCholeskyCov could be improved?
  • Is it possible to rename names derived from chol, like chol_stds, and only keep non-redundant and non-constant dimensions? That causes problems when calling az.plot_trace() with chol_coorr variable name, because of the correlations that are equal to 1. For example, I would like to have chol_stds[0]b_sigma[Interecept] and chol_stds[1]b_sigma[Slope].
  • I also tried passing dims to pm.LKJCholeskyCov to obtain labeled coordinates, but there’s no effect. Is it possible to have labeled dimensions when using pm.LKJCholeskyCov?

I don’t include the trace plot where I use az.plot_trace(idata, var_names="chol_corr"). But I get a lot of warning messages because of the 1 correlations.

This is my current approach now.


coords = {
    "subject": data["Subject"].unique(),
    "effect": ["Intercept", "Slope"]
}
with pm.Model(coords=coords) as model_3:
    # Common part
    β_0 = pm.HalfNormal("β_0", sigma=100)
    β_days = pm.Normal("β_days", mu=0, sigma=100)
    β = tt.stack(β_0, β_days)
    
    # Group-specific part   
    b_σ = pm.HalfNormal.dist(sigma=100)
    chol, corr, sigma = pm.LKJCholeskyCov('chol', n=2, eta=1, sd_dist=b_σ, compute_corr=True, store_in_trace=False)
    b_raw = pm.Normal('b_raw', mu=0, sigma=1, dims=("subject", "effect"))
    b = pm.Deterministic('b', tt.dot(chol, b_raw.T).T, dims=("subject", "effect")) 
    
    # Separate 
    b_sigma = pm.Deterministic("b_sigma", sigma, dims="effect")
    b_corr = pm.Deterministic("b_corr", corr[0, 1])
    
    mu = pm.math.dot(X, β) + pm.math.dot(Z_0, b[:, 0]) + pm.math.dot(Z_slope, b[:, 1])
    
    σ = pm.HalfNormal("σ", sigma=100)
    y = pm.Normal('y', mu=mu, sigma=σ, observed=data.Reaction)
    
    idata = pm.sample(return_inferencedata=True)

# Some cleanup in the posterior object.
idata.posterior = idata.posterior.drop_vars(["chol", "b_raw"])
idata.posterior = idata.posterior.drop_dims("chol_dim_0")
idata.posterior = idata.posterior.transpose("chain", "draw", "effect", "subject")

And it gives a better trace plot (in my opinion)

Unfortunately, it is still not possible to compact using only one dimension, so varying effects for the intercept and slope are plotted together.