Hello, I’m new to PyMC (and Bayesian regression modeling in general) so it’s likely this question is covered in the docs, but I’m not sure exactly what keyword I need to search for.
Basically, I am trying to sauce up a basic regression of the form y = a_0 + a_1 x with random effects associated with different treatments of x. So, in addition to each (continuous) x observation, I also have indices i (subject), j (trial), and k (experimenter), and the model I actually want to estimate can be written as
(notation follows https://www.pymc.io/projects/docs/en/latest/learn/core_notebooks/pymc_overview.html
)
where b, c, and d capture any potential bias associated with the given treatments.
Below, I’ve implemented the model in PyMC with some sample data:
import arviz as az
import numpy as np
import pymc as pm
i_data = np.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1])
j_data = np.array([0, 0, 1, 1, 2, 2, 0, 0, 1, 2, 2])
k_data = np.array([0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1])
x_data = np.array([1.5, 2.5, 1.5, 6.5, 1.4, 1.5, 1.8, 0.5, 5.5, 1.5, 1.5])
y_data = np.array([4.5, 1.5, 5.5, 4.5, 4.7, 7.5, 4.5, 4.6, 4.5, 9.5, 8.5])
model = pm.Model()
with model:
a = pm.Normal("a", mu=0.0, sigma=1.0, shape=2)
b = pm.Normal("b", mu=0.0, sigma=1.0, shape=np.unique(i_data).size)
c = pm.Normal("c", mu=0.0, sigma=1.0, shape=np.unique(j_data).size)
d = pm.Normal("d", mu=0.0, sigma=1.0, shape=np.unique(k_data).size)
sigma = pm.HalfNormal("sigma", sigma=1.0)
mu = [
a[0] + a[1] * x + b[i] + c[j] + d[k]
for i, j, k, x in zip(i_data, j_data, k_data, x_data)
]
y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=y_data)
idata = pm.sample()
print(az.summary(idata))
This code works, but I am not in love with it.
- It is pretty slow if I use a large dataset: <10 draws/second (although I admit I don’t have a baseline sense of how fast it should be)
- The variable declarations
b
,c
, andd
and the expression formu
are repetitive and will become unwieldy if there are more than 3 treatment variables.
Is there a more PyMC native of implementing a model like this that would allow me to declare i
, j
, and k
as random effects variables and assign their priors in fewer steps?
(I realized I could loop columns of a data frame, but I’m more interested in actually moving the list comprehension for mu
into native PyMC logic.)