More elegant way to construct random effects model

79885yhnert67h · July 24, 2025, 2:39pm

Hello, I’m new to PyMC (and Bayesian regression modeling in general) so it’s likely this question is covered in the docs, but I’m not sure exactly what keyword I need to search for.

Basically, I am trying to sauce up a basic regression of the form y = a_0 + a_1 x with random effects associated with different treatments of x. So, in addition to each (continuous) x observation, I also have indices i (subject), j (trial), and k (experimenter), and the model I actually want to estimate can be written as

Y_{ijk} \sim \mathcal{N}(\mu_{ijk}, \sigma^2) \\ \mu_{ijk} = a_0 + a_1 X_{ijk} + b_i + c_j + d_k

(notation follows https://www.pymc.io/projects/docs/en/latest/learn/core_notebooks/pymc_overview.html)

where b, c, and d capture any potential bias associated with the given treatments.

Below, I’ve implemented the model in PyMC with some sample data:

import arviz as az
import numpy as np
import pymc as pm

i_data = np.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1])
j_data = np.array([0, 0, 1, 1, 2, 2, 0, 0, 1, 2, 2])
k_data = np.array([0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1])
x_data = np.array([1.5, 2.5, 1.5, 6.5, 1.4, 1.5, 1.8, 0.5, 5.5, 1.5, 1.5])
y_data = np.array([4.5, 1.5, 5.5, 4.5, 4.7, 7.5, 4.5, 4.6, 4.5, 9.5, 8.5])

model = pm.Model()

with model:
    a = pm.Normal("a", mu=0.0, sigma=1.0, shape=2)
    b = pm.Normal("b", mu=0.0, sigma=1.0, shape=np.unique(i_data).size)
    c = pm.Normal("c", mu=0.0, sigma=1.0, shape=np.unique(j_data).size)
    d = pm.Normal("d", mu=0.0, sigma=1.0, shape=np.unique(k_data).size)
    sigma = pm.HalfNormal("sigma", sigma=1.0)

    mu = [
        a[0] + a[1] * x + b[i] + c[j] + d[k]
        for i, j, k, x in zip(i_data, j_data, k_data, x_data)
    ]

    y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=y_data)

    idata = pm.sample()

print(az.summary(idata))

This code works, but I am not in love with it.

It is pretty slow if I use a large dataset: <10 draws/second (although I admit I don’t have a baseline sense of how fast it should be)
The variable declarations b, c, and d and the expression for mu are repetitive and will become unwieldy if there are more than 3 treatment variables.

Is there a more PyMC native of implementing a model like this that would allow me to declare i, j, and k as random effects variables and assign their priors in fewer steps?

(I realized I could loop columns of a data frame, but I’m more interested in actually moving the list comprehension for mu into native PyMC logic.)

jessegrabowski · July 24, 2025, 2:48pm

You want to use advanced indexing:

mu = a[0] + a[1] * x + b[i_data] + c[j_data] + d[k_data]

You can use data containers with labeled dimensions to make the code more readable as well.

Topic		Replies	Views
Fit many regressions simultaneously Questions	4	1161	November 3, 2017
Hierarchical regression models for ratings data ( 2 by 2 within-subject design) Questions	3	1763	December 12, 2019
Managing indices/data shape in a multi-level linear regression for incremental model updating Questions	0	378	September 10, 2020
How to specify this model directly in Pymc3? Questions	5	903	March 3, 2020
Creating a multivariable regressor with multiple groups Questions	5	1783	November 8, 2018

More elegant way to construct random effects model

Related topics