I’m trying to do some multilevel modeling and am not sure how to actually specify the multiple levels in my case.
The existing tutorials don’t seem to address what I’m try to do (A Primer on Bayesian Methods for Multilevel Modeling, @OriolAbril 's PyMC3 with labeled coords and dims).
In the demo below, I have six groups, which have three replicates each.
The groups may have different number of observations, as show below.
There could be a case where some groups have different numbers of replicates, too.
Generate Data
num_groups = 6
num_replicates = 3
mu_a_list = np.arange(num_groups)
sigma_a_list = np.arange(num_groups) * 0.25 + 0.25
sigma_b_list = np.arange(num_groups) * 0.1 + 0.3
df_data = pd.DataFrame()
mu_array = np.zeros((num_replicates, num_groups))
sigma_array = np.zeros((num_replicates, num_groups))
for i in range(num_groups):
mu_a = mu_a_list[i]
sigma_a = sigma_a_list[i]
for j in range(num_replicates):
sigma_b = sigma_b_list[j]
mu_ij = stats.norm.rvs(loc=mu_a, scale=sigma_a)
sigma_ij = stats.halfnorm.rvs(scale=sigma_b)
mu_array[j,i] = mu_ij
sigma_array[j,i] = sigma_ij
x = stats.norm.rvs(loc=mu_ij, scale=sigma_ij, size=(2000 + i + j))
df_temp = pd.DataFrame(data=x, columns=["height"])
df_temp["group"] = i
df_temp["replicate"] = j
df_data = pd.concat((df_data, df_temp), axis=0)
df_data = df_data.reset_index(drop=True)
fig, axes = plt.subplots(nrows=num_replicates, ncols=num_groups, figsize=(20, 10))
for i in range(num_groups):
for j in range(num_replicates):
plt.sca(axes[j, i])
plt.hist(
df_data.loc[
np.logical_and(df_data["group"] == i, df_data["replicate"] == j),
"height",
]
)
plt.xlim(-10, 10)
Inference with PyMC3
I would like to recover mu_a_list
, sigma_a_list
, sigma_b_list
, mu_array
, sigma_array
.
This model doesn’t work, but is in the direction of what I think should work. In part, I don’t know how to specify the joint membership of an observation in both a group
and replicate
group_idx, groups = pd.factorize(df_data["group"], sort=True)
replicate_idx, replicates = pd.factorize(df_data["replicate"], sort=True)
coords = {
"group": groups,
"obs_id": np.arange(df_data.shape[0]),
"replicate":replicates,
}
model = pm.Model(coords=coords)
with model:
BoundHalfNormal = pm.Bound(pm.HalfNormal, lower=0)
mu_a = pm.Normal("μ_a", mu=0, sigma=10, dims="group")
sigma_a = BoundHalfNormal("σ_a", sigma=10, dims="group")
sigma_b = BoundHalfNormal("σ_b", sigma=10, dims="group")
mu = pm.Normal("μ", mu=mu_a[group_idx], sigma=sigma_a[group_idx], dims=("group", "replicate"))
sigma = BoundHalfNormal("σ", sigma=sigma_b[replicate_idx], dims=("group", "replicate"))
obs = pm.Normal(
"obs",
mu=mu[group_idx, replicate_idx],
sigma=sigma[group_idx, replicate_idx],
observed=df_data["height"],
dims="obs_id",
)