Multi-Multilevel Modeling

I’m trying to do some multilevel modeling and am not sure how to actually specify the multiple levels in my case.

The existing tutorials don’t seem to address what I’m try to do (A Primer on Bayesian Methods for Multilevel Modeling, @OriolAbril 's PyMC3 with labeled coords and dims).
In the demo below, I have six groups, which have three replicates each.
The groups may have different number of observations, as show below.
There could be a case where some groups have different numbers of replicates, too.

Generate Data


num_groups = 6
num_replicates = 3
mu_a_list = np.arange(num_groups)
sigma_a_list = np.arange(num_groups) * 0.25 + 0.25
sigma_b_list = np.arange(num_groups) * 0.1 + 0.3

df_data = pd.DataFrame()

mu_array = np.zeros((num_replicates, num_groups))
sigma_array = np.zeros((num_replicates, num_groups))

for i in range(num_groups):

    mu_a = mu_a_list[i]
    sigma_a = sigma_a_list[i]

    for j in range(num_replicates):
        sigma_b = sigma_b_list[j]

        mu_ij = stats.norm.rvs(loc=mu_a, scale=sigma_a)
        sigma_ij = stats.halfnorm.rvs(scale=sigma_b)
        
        mu_array[j,i] = mu_ij
        sigma_array[j,i] = sigma_ij
       
        x = stats.norm.rvs(loc=mu_ij, scale=sigma_ij, size=(2000 + i + j))
        df_temp = pd.DataFrame(data=x, columns=["height"])
        df_temp["group"] = i
        df_temp["replicate"] = j
        df_data = pd.concat((df_data, df_temp), axis=0)

df_data = df_data.reset_index(drop=True)

fig, axes = plt.subplots(nrows=num_replicates, ncols=num_groups, figsize=(20, 10))

for i in range(num_groups):
    for j in range(num_replicates):
        plt.sca(axes[j, i])
        plt.hist(
            df_data.loc[
                np.logical_and(df_data["group"] == i, df_data["replicate"] == j),
                "height",
            ]
        )

        plt.xlim(-10, 10)

Inference with PyMC3

I would like to recover mu_a_list, sigma_a_list, sigma_b_list, mu_array, sigma_array.

This model doesn’t work, but is in the direction of what I think should work. In part, I don’t know how to specify the joint membership of an observation in both a group and replicate


group_idx, groups = pd.factorize(df_data["group"], sort=True)
replicate_idx, replicates = pd.factorize(df_data["replicate"], sort=True)
coords = {
    "group": groups,
    "obs_id": np.arange(df_data.shape[0]),
    "replicate":replicates,
}

model = pm.Model(coords=coords)

with model:
    BoundHalfNormal = pm.Bound(pm.HalfNormal, lower=0)

    mu_a = pm.Normal("μ_a", mu=0, sigma=10, dims="group")
    sigma_a = BoundHalfNormal("σ_a", sigma=10, dims="group")
    sigma_b = BoundHalfNormal("σ_b", sigma=10, dims="group")

    mu = pm.Normal("μ", mu=mu_a[group_idx], sigma=sigma_a[group_idx], dims=("group", "replicate"))
    sigma = BoundHalfNormal("σ", sigma=sigma_b[replicate_idx], dims=("group", "replicate"))

    obs = pm.Normal(
        "obs",
        mu=mu[group_idx, replicate_idx],
        sigma=sigma[group_idx, replicate_idx],
        observed=df_data["height"],
        dims="obs_id",
    )
1 Like

Just a small note that you may be aware of: those bounds are not doing anything different than using the half normal directly

@ricardoV94 the Bound? I cleaned this up to share it, and was originally looking at a Gamma distribution downstream and had some errors if a 0 was every sampled. But I’ll keep that in mind without Bound too. Thanks!

About the model, I would suggest you try to build it without dims and add them incrementally to see where it breaks. The double use of group_idx in mu seems funny but I am not sure where the actual problem is.

I am assuming you are getting a shape or index error somewhere?

The latter part is most definitely broken since I’m just not sure how to formulate it. I’m away from my computer at the moment, but the simplest model works, where there is only a single group, so the only dims are the replicate and some of the priors/hyperpriors are no longer needed. I can post the simpler version that works, but I’m not sure how to transition it to the diagram I’ve shown here.

These lines are definitely wrong. group_idx is an array with dim obs_id and values between 0 and (n_groups - 1) so it can be used to index arrays with dimension group and “defactorize” (never known how to call this fancy indexing action) them into arrays of dim obs_id.

In your case, you want to generate group, replicate arrays with draws from a normal variable so that their mean is defined by the group dimension, hence the hiererchy. With the current approach, these two lines should be one of:

mu = pm.Normal("μ", mu=mu_a, sigma=sigma_a, dims=("group", "replicate"))
sigma = BoundHalfNormal("σ", sigma=sigma_b, dims=("group", "replicate"))

or

mu = pm.Normal("μ", mu=mu_a[: , None], sigma=sigma_a[:, None], dims=("group", "replicate"))
sigma = BoundHalfNormal("σ", sigma=sigma_b[:, None], dims=("group", "replicate"))

Note however that this approach will only work with non-ragged data, that is, all groups have the same number of replications. In this non-ragged data case, hierarchies can be represented by adding dimensions to the parameter arrays and having each dimension be a hierarchy level.

In the case of ragged data, you’ll need multiple levels of indexing (alias defactorizer) arrays. Here is a quick example, which I think has come up at other points so would probably be great to have a blogpost/example about this, if you are up to the task feel free to use any code/pseudocode here.

Imagine some data like:

obs_id 1 2 3 4 5 6 7 8
group 1 1 1 2 2 3 3 3
subject
1 2 2 2 2 1 1 2
replicate
1 1 2 1 2 1 2 1

where I have added an extra level ot better showcase the approach. So here we want the group level parameters to set the priors of the subject level ones and the subject level ones to do the same on the replicate ones, which are the ones we use to compare to observations/define the likelihood.

We would therefore define something like:

mu_group = pm.Normal("mu_group", 0, 10, dims="group")
mu_subject = pm.Normal(
    "mu_subject", mu_group[group_subject_idx], 10, dims="group_subject"
)
mu = pm.Norml("mu", mu_subject[subject_rep_idx], 10, dims="obs_id")

In this case, the coordinates and indices would be:

coords = {
    "group": [1, 2, 3], # len 3
    "subject_group": ["1-1", "1-2", "2-2", "3-1", "3-2"], #len 5
    "obs_idx": np.arange(8)
}
# needs values between 0-2 and len 5
group_subject_idx = [0, 0, 1, 2, 2]
# needs values between 0-4 and len 8, 
# could be a factorize of the multiindex converted to tuple for example
subject_rep_idx = [0, 1, 1, 2, 2, 3, 3, 4]
1 Like

Thanks @OriolAbril for all the details! At the moment I have non-ragged data (in terms of equal replicates per group), so I’ll focus on the first half first, but want to make sure I can get your second demo to work too. The definitions for mu and sigma are right now, but how do I handle defining the likelihood now?

This works:

group_idx, groups = pd.factorize(df_data["group"], sort=True)
replicate_idx, replicates = pd.factorize(df_data["replicate"], sort=True)

coords = {
    "group": groups,
    "obs_id": np.arange(df_data.shape[0]),
    "replicate":replicates,
}

model = pm.Model(coords=coords)

with model:
    BoundHalfNormal = pm.Bound(pm.HalfNormal, lower=0)
    
    mu_a = pm.Normal("μ_a", mu=0, sigma=10, dims="group")
    sigma_a = BoundHalfNormal("σ_a", sigma=10, dims="group")
    
    sigma_b = BoundHalfNormal("σ_b", sigma=10, dims="group")

    mu = pm.Normal("μ", mu=mu_a, sigma=sigma_a, dims=("replicate","group"))
    sigma = BoundHalfNormal("σ", sigma=sigma_b, dims=("replicate", "group"))

    obs = pm.Normal(
        "obs",
        mu=mu[replicate_idx, group_idx],
        sigma=sigma[replicate_idx, group_idx],
        observed=df_data["height"],
        dims="obs_id",
    )
    idata = pm.sample(draws=2_000, tune=1_000, chains=4, return_inferencedata=True)

It’s now running (very slowly!) Does this otherwise look okay? And to be clear, the raggedness pertains to number of groups-replicates, vs number of observations per group-replicate?

2 Likes

It looks good yes. The slowness can be due to the hierarchical model generating funnel geometries that the sampler has trouble exploring, you may want to try the non centered parametrization like in the multilevel modeling notebook, but note that which of centered or non-centered works best depends on both model and data, so you may need one here and the other for the actual model with the gammas and so on. Another factor might be the pointwise indexing mu[replicate_idx, group_idx] (I have no idea if how this is implemented/how optimized it is in Theano), it could be the case that the approach I overlaid for ragged data performs better, even for non ragged data (you can always reshape the output after sampling).

In this case you have partially ragged data, you have the same number of replicates in every group, but different number of observations in each group-replicate, otherwise it would be possible to have the observations shaped as 3, 6, 2000 and no indexing would be needed, not even in the likelihood definition.

Hey @OriolAbril some updates!
First, the fully working model may be still be “slow” but at the time, I was comparing it to another model with much fewer observations, which wasn’t fair.

I also gave the ragged-model version a spin and applied to the dummy data above by skipping a replicate in the data generation stage:


num_groups = 6
num_replicates = 3

mu_a_list = np.arange(num_groups)
sigma_a_list = np.arange(num_groups) * 0.25 + 0.25

sigma_b_list = np.arange(num_groups) * 0.1 + 0.3


df_data = pd.DataFrame()

mu_array = np.zeros((num_replicates, num_groups))
sigma_array = np.zeros((num_replicates, num_groups))


for i in range(num_groups):
    mu_a = mu_a_list[i]
    sigma_a = sigma_a_list[i]

    for j in range(num_replicates):
        if j == 2 and i == 1:
            continue
        sigma_b = sigma_b_list[j]

        mu_ij = stats.norm.rvs(loc=mu_a, scale=sigma_a)
        sigma_ij = stats.halfnorm.rvs(scale=sigma_b)

        mu_array[j,i] = mu_ij
        sigma_array[j,i] = sigma_ij

        x = stats.norm.rvs(loc=mu_ij, scale=sigma_ij, size=(100 + i + j))
        df_temp = pd.DataFrame(data=x, columns=["height"])
        df_temp["group"] = i
        df_temp["replicate"] = j
        df_data = pd.concat((df_data, df_temp), axis=0)
df_data = df_data.reset_index(drop=True)
df_data["group_replicate"] = df_data["group"].astype(str) + "-" + df_data["replicate"].astype(str)

replicate_obs_idx, group_replicates = pd.factorize(df_data["group_replicate"], sort=True)
df_data_first = df_data.groupby('group_replicate').first()
group_replicate_idx = df_data_first.loc[group_replicates,"group"].tolist()
__, groups = pd.factorize(df_data["group"], sort=True)
__, replicates = pd.factorize(df_data["replicate"], sort=True)

coords = {
    "group": groups,
    "replicate":replicates,
    "group_replicate":group_replicates,
    "obs_id": np.arange(df_data.shape[0]),
}

model = pm.Model(coords=coords)

with model:

    mu_a = pm.Normal("μ_a", mu=0, sigma=10, dims="group")
    sigma_a = pm.HalfNormal("σ_a", sigma=10, dims="group")

    sigma_b = pm.HalfNormal("σ_b", sigma=10, dims="group")

    mu = pm.Normal("μ", mu=mu_a[group_replicate_idx], sigma=sigma_a[group_replicate_idx], dims="group_replicate")
    sigma = pm.HalfNormal("σ", sigma=sigma_b[group_replicate_idx], dims="group_replicate")

    obs = pm.Normal(
        "obs",
        mu=mu[replicate_obs_idx],
        sigma=sigma[replicate_obs_idx],
        observed=df_data["height"],
        dims="obs_id",
    )

    idata = pm.sample(draws=2_000, tune=500, chains=4, return_inferencedata=True)

Running the previous partially ragged groups-replicates in this manner is now about twice as fast!

This is just a demo dataset, but it does give me a ton of divergences and poor acceptance probability to consider.

1 Like

I know this is an old post, but for clarity:

  1. The “subject_group” key in coords actually be named “group_subject”?
  2. What do the indices of the group_subject_idx list map from and to?

Is there a write up of this type of problem for general numbers of levels? I am still trying to grok this example, and it would be great to see an end-to-end model