More elegant way to construct random effects model

Hello, I’m new to PyMC (and Bayesian regression modeling in general) so it’s likely this question is covered in the docs, but I’m not sure exactly what keyword I need to search for.

Basically, I am trying to sauce up a basic regression of the form y = a_0 + a_1 x with random effects associated with different treatments of x. So, in addition to each (continuous) x observation, I also have indices i (subject), j (trial), and k (experimenter), and the model I actually want to estimate can be written as

Y_{ijk} \sim \mathcal{N}(\mu_{ijk}, \sigma^2) \\ \mu_{ijk} = a_0 + a_1 X_{ijk} + b_i + c_j + d_k

(notation follows https://www.pymc.io/projects/docs/en/latest/learn/core_notebooks/pymc_overview.html)

where b, c, and d capture any potential bias associated with the given treatments.

Below, I’ve implemented the model in PyMC with some sample data:

import arviz as az
import numpy as np
import pymc as pm

i_data = np.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1])
j_data = np.array([0, 0, 1, 1, 2, 2, 0, 0, 1, 2, 2])
k_data = np.array([0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1])
x_data = np.array([1.5, 2.5, 1.5, 6.5, 1.4, 1.5, 1.8, 0.5, 5.5, 1.5, 1.5])
y_data = np.array([4.5, 1.5, 5.5, 4.5, 4.7, 7.5, 4.5, 4.6, 4.5, 9.5, 8.5])

model = pm.Model()

with model:
    a = pm.Normal("a", mu=0.0, sigma=1.0, shape=2)
    b = pm.Normal("b", mu=0.0, sigma=1.0, shape=np.unique(i_data).size)
    c = pm.Normal("c", mu=0.0, sigma=1.0, shape=np.unique(j_data).size)
    d = pm.Normal("d", mu=0.0, sigma=1.0, shape=np.unique(k_data).size)
    sigma = pm.HalfNormal("sigma", sigma=1.0)

    mu = [
        a[0] + a[1] * x + b[i] + c[j] + d[k]
        for i, j, k, x in zip(i_data, j_data, k_data, x_data)
    ]

    y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=y_data)

    idata = pm.sample()

print(az.summary(idata))

This code works, but I am not in love with it.

  • It is pretty slow if I use a large dataset: <10 draws/second (although I admit I don’t have a baseline sense of how fast it should be)
  • The variable declarations b, c, and d and the expression for mu are repetitive and will become unwieldy if there are more than 3 treatment variables.

Is there a more PyMC native of implementing a model like this that would allow me to declare i, j, and k as random effects variables and assign their priors in fewer steps?

(I realized I could loop columns of a data frame, but I’m more interested in actually moving the list comprehension for mu into native PyMC logic.)

You want to use advanced indexing:

mu = a[0] + a[1] * x + b[i_data] + c[j_data] + d[k_data]

You can use data containers with labeled dimensions to make the code more readable as well.

1 Like