# Mixed models and pm.LKJCholeskyCov

I’m trying to understand how to use correlated priors for the varying intercepts and slopes in mixed models because I would like to incorporate that into Bambi. I’m using the famous sleepstudy dataset. This post aims mainly to check whether what I’m doing is correct and/or check if there are better alternatives.

First of all, I fit the model using independent priors.

``````# Libraries
import arviz as az
import pandas as pd
import pymc3 as pm
import theano.tensor as tt

from formulae import design_matrices

"https://raw.githubusercontent.com/vincentarelbundock/Rdatasets/master/csv/lme4/sleepstudy.csv",
index_col = 0
)

# Get design matrices for common and group-specific parts
dm = design_matrices("Reaction ~ Days + (Days|Subject)", data)
X = dm.common.design_matrix
Z_0 = dm.group["1|Subject"]
Z_slope = dm.group["Days|Subject"]

# Just the mean of the prior for the intercept
β_0_mean = data["Reaction"].mean()

# Model 1:Independent priors for varying intercept and slopes

with pm.Model() as model:
# Common part
β_0 = pm.HalfNormal("β_0", sigma=100)
β_days = pm.Normal("β_days", mu=0, sigma=100)
β = tt.stack(β_0, β_days)

# Group-specific part, with non-centered parametrization
# Intercept
b_0_σ = pm.HalfNormal("b_0_σ", sigma=100)
b_0_offset = pm.Normal("b_0_offset", mu=0, sigma=1, shape=Z_0.shape)
b_0 = pm.Deterministic("b_0", b_0_offset * b_0_σ)

# Slope
b_days_σ = pm.HalfNormal("b_days_σ", sigma=100)
b_days_offset = pm.Normal("b_days_offset", mu=0, sigma=1, shape=Z_slope.shape)
b_days = pm.Deterministic("b_days", b_days_offset * b_days_σ)

# Conditional mean
μ = pm.math.dot(X, β) + pm.math.dot(Z_0, b_0) + pm.math.dot(Z_slope, b_days)

σ = pm.HalfNormal("σ", sigma=100)
y = pm.Normal('y', mu=μ, sigma=σ, observed=data.Reaction)

idata = pm.sample(return_inferencedata=True)
``````

And everything works great.

But here’s the thing when I go to model 2

``````
coords = {
"Subject": data["Subject"].unique(),
"Effect": ["Intercept", "Slope"]
}
with pm.Model(coords=coords) as model_2:
# Common part
β_0 = pm.HalfNormal("β_0", sigma=100)
β_days = pm.Normal("β_days", mu=0, sigma=100)
β = tt.stack(β_0, β_days)

# Group-specific part
b_σ = pm.HalfNormal.dist(sigma=100, dims="Effect")
chol, _, _ = pm.LKJCholeskyCov('chol', n=2, eta=1, sd_dist=b_σ, compute_corr=True)
b_raw = pm.Normal('b_raw', mu=0, sigma=1, dims=("Subject", "Effect"))
b = pm.Deterministic('b', tt.dot(chol, b_raw.T).T, dims=("Subject", "Effect"))

mu = pm.math.dot(X, β) + pm.math.dot(Z_0, b[:, 0]) + pm.math.dot(Z_slope, b[:, 1])

σ = pm.HalfNormal("σ", sigma=100)
y = pm.Normal('y', mu=mu, sigma=σ, observed=data.Reaction)

idata = pm.sample(return_inferencedata=True)
``````

It works, but slower, and sometimes I get divergences… I was expecting the opposite.

• Anyone knows if my approach using the LKJCholeskyCov could be improved?
• Is it possible to rename names derived from `chol`, like `chol_stds`, and only keep non-redundant and non-constant dimensions? That causes problems when calling `az.plot_trace()` with `chol_coorr` variable name, because of the correlations that are equal to 1. For example, I would like to have `chol_stds``b_sigma[Interecept]` and `chol_stds``b_sigma[Slope]`.
• I also tried passing `dims` to `pm.LKJCholeskyCov` to obtain labeled coordinates, but there’s no effect. Is it possible to have labeled dimensions when using `pm.LKJCholeskyCov`?

I don’t include the trace plot where I use `az.plot_trace(idata, var_names="chol_corr")`. But I get a lot of warning messages because of the 1 correlations.

This is my current approach now.

``````
coords = {
"subject": data["Subject"].unique(),
"effect": ["Intercept", "Slope"]
}
with pm.Model(coords=coords) as model_3:
# Common part
β_0 = pm.HalfNormal("β_0", sigma=100)
β_days = pm.Normal("β_days", mu=0, sigma=100)
β = tt.stack(β_0, β_days)

# Group-specific part
b_σ = pm.HalfNormal.dist(sigma=100)
chol, corr, sigma = pm.LKJCholeskyCov('chol', n=2, eta=1, sd_dist=b_σ, compute_corr=True, store_in_trace=False)
b_raw = pm.Normal('b_raw', mu=0, sigma=1, dims=("subject", "effect"))
b = pm.Deterministic('b', tt.dot(chol, b_raw.T).T, dims=("subject", "effect"))

# Separate
b_sigma = pm.Deterministic("b_sigma", sigma, dims="effect")
b_corr = pm.Deterministic("b_corr", corr[0, 1])

mu = pm.math.dot(X, β) + pm.math.dot(Z_0, b[:, 0]) + pm.math.dot(Z_slope, b[:, 1])

σ = pm.HalfNormal("σ", sigma=100)
y = pm.Normal('y', mu=mu, sigma=σ, observed=data.Reaction)

idata = pm.sample(return_inferencedata=True)

# Some cleanup in the posterior object.
idata.posterior = idata.posterior.drop_vars(["chol", "b_raw"])
idata.posterior = idata.posterior.drop_dims("chol_dim_0")
idata.posterior = idata.posterior.transpose("chain", "draw", "effect", "subject")
``````

And it gives a better trace plot (in my opinion)

Unfortunately, it is still not possible to compact using only one dimension, so varying effects for the intercept and slope are plotted together.