How to properly index multilevel model with repeated measures data

Dekermanjian · April 18, 2024, 4:22pm

I am trying to fit a multilevel model on some longitudinal data. In this data I have ~300k observations on ~ 1400 patients. The goal is to have a multilevel model where level 0 models a patients baseline risk and level 1 models additional risk based on the repeated measures variables.

Below is my model code and an graph image of the model:

with pm.Model(coords = coords) as multilevel_model:
    # Prior
    sigma_a = pm.HalfCauchy("sigma_a", 2)

    # Patient Baseline
    gamma = pm.Normal("gamma_0", 0, 1)
    gamma_gender = pm.Normal("gamma_gender", 0, 1, dims = ["gender", "patients"])
    gamma_unit = pm.Normal("gamma_unit", 0, 1, dims = ["unit", "patients"])
    gamma_care = pm.Normal("gamma_care", 0, 1, dims = ["care", "patients"])
    gamma_age = pm.Normal("b_age", 0, 1)

    mu_a = pm.Deterministic(
        "mu_a",
        var = gamma + 
        gamma_gender[gender_idx] + 
        gamma_unit[unit_idx] + 
        gamma_care[care_idx] + 
        gamma_age*age,
        dims = ["obs", "patients"]
    )

# Unexplained Patient Variation
    epsilon_a = pm.Normal("epsilon_a", 0, 1, dims = ["obs", "patients"])
    a = pm.Deterministic("a", var=mu_a + sigma_a * epsilon_a, dims = ["obs", "patients"])
    
    #  Common slopes priors
    b_time = pm.Normal("b_time", 0, 1)
    b_hematocrit = pm.Normal("b_hematocrit", 0, 1)
    b_min_bp = pm.Normal("b_min_bp", 0, 1)
    b_max_bp = pm.Normal("b_max_bp", 0, 1)
    b_min_temp = pm.Normal("b_min_temp", 0, 1)
    b_max_temp = pm.Normal("b_max_temp", 0, 1)
    b_min_rsp = pm.Normal("b_min_rsp", 0, 1)
    b_max_rsp = pm.Normal("b_max_rsp", 0, 1)
    b_min_pls = pm.Normal("b_min_pls", 0, 1)
    b_max_pls = pm.Normal("b_max_pls", 0, 1)
    b_min_sgr = pm.Normal("b_min_sgr", 0, 1)
    b_max_sgr = pm.Normal("b_max_sgr", 0, 1)
    b_min_pn = pm.Normal("b_min_pn", 0, 1)
    b_max_pn = pm.Normal("b_max_pn", 0, 1)
    
    # Expectation
    p = pm.Deterministic(
        "p", 
        pm.math.invlogit(
            pm.math.mean(a[:, patient_idx]) + 
            b_time*time +
            b_hematocrit*hematocrit +
            b_min_bp*min_BP +
            b_max_bp*max_BP +
            b_min_temp*min_temp +
            b_max_temp*max_temp +
            b_min_rsp*min_rsp +
            b_max_rsp*max_rsp +
            b_min_pls*min_pls + 
            b_max_pls*max_pls +
            b_min_sgr*min_sgr +
            b_max_sgr*max_sgr +
            b_min_pn*min_pn +
            b_max_pn*max_pn
        ),
        dims="obs"
    )
    likelihood = pm.Bernoulli("likelihood", p = p, observed = transferred, dims="obs")

I keep getting shape error messages when I try to sample this model.

Any tips/advice would really be appreciated.

jessegrabowski · April 18, 2024, 5:32pm

The best way is to use the .eval() method. This allows you to eagerly compute a single variable in your model. You should check that all of these different components have the correct shapes with e.g. (b_time * time).shape.eval(), (b_hematocrit*hematocrit).shape.eval(), and so on.

Dekermanjian · April 18, 2024, 6:01pm

Thank you for that suggestion @jessegrabowski I appreciate your help. If you don’t mind me asking you, does it make sense the way I had to create 2 dimensional parameters in level 0 to address the fact that there are more observations than patients (repeated measures) in my data?

jessegrabowski · April 18, 2024, 6:05pm

I would suggest you do everything in long form, so instead of having a matrix of dimensions (observation, patient), you just have a vector of length n_obs * n_patients with basically a multi-index. I’m not sure if you have a balanced panel or not, but I find it’s easier to index into long-form data.

I’m also leery of the fact that you’re expanding the a variable with patient_idx just to take the mean of it. That doesn’t seem correct.

Dekermanjian · April 18, 2024, 6:30pm

@jessegrabowski Yes, I am also quite concerned about taking the mean of a. It was the only way I could get it to not throw out an error of p being multidimensional. Do you have a reference I could check out on how to index into long-form data? The data I have is already in long-form each observation is a row. When I was implementing the pymc code I did initially try to only index my level 0 parameters like b_gender only by dims=“gender” but when I try to construct the a variable that has dims = “patients” it would complain that it could not broadcast variables.

edit:
I think I was able to make some progress thanks to your suggestions. I am now able to sample from the following model:

with pm.Model(coords = coords) as multilevel_model:
    # Prior
    sigma_a = pm.HalfCauchy("sigma_a", 2)

    # Patient Baseline
    gamma = pm.Normal("gamma_0", 0, 1)
    gamma_gender = pm.Normal("gamma_gender", 0, 1, dims = "gender")
    gamma_unit = pm.Normal("gamma_unit", 0, 1, dims = "unit")
    gamma_care = pm.Normal("gamma_care", 0, 1, dims = "care")
    gamma_age = pm.Normal("b_age", 0, 1)

    mu_a = pm.Deterministic(
        "mu_a",
        var = gamma + 
        gamma_gender[gender_idx] + 
        gamma_unit[unit_idx] + 
        gamma_care[care_idx] + 
        gamma_age*age
    )

    # Unexplained Patient Variation
    epsilon_a = pm.Normal("epsilon_a", 0, 1)
    a = pm.Deterministic("a", var=mu_a + sigma_a * epsilon_a)
    
    # Common slopes priors
    b_time = pm.Normal("b_time", 0, 1)
    b_hematocrit = pm.Normal("b_hematocrit", 0, 1)
    b_min_bp = pm.Normal("b_min_bp", 0, 1)
    b_max_bp = pm.Normal("b_max_bp", 0, 1)
    b_min_temp = pm.Normal("b_min_temp", 0, 1)
    b_max_temp = pm.Normal("b_max_temp", 0, 1)
    b_min_rsp = pm.Normal("b_min_rsp", 0, 1)
    b_max_rsp = pm.Normal("b_max_rsp", 0, 1)
    b_min_pls = pm.Normal("b_min_pls", 0, 1)
    b_max_pls = pm.Normal("b_max_pls", 0, 1)
    b_min_sgr = pm.Normal("b_min_sgr", 0, 1)
    b_max_sgr = pm.Normal("b_max_sgr", 0, 1)
    b_min_pn = pm.Normal("b_min_pn", 0, 1)
    b_max_pn = pm.Normal("b_max_pn", 0, 1)

# Expectation
    p = pm.Deterministic(
        "p", 
        pm.math.invlogit(
            a[patient_idx] + 
            b_time*time +
            b_hematocrit*hematocrit +
            b_min_bp*min_BP +
            b_max_bp*max_BP +
            b_min_temp*min_temp +
            b_max_temp*max_temp +
            b_min_rsp*min_rsp +
            b_max_rsp*max_rsp +
            b_min_pls*min_pls + 
            b_max_pls*max_pls +
            b_min_sgr*min_sgr +
            b_max_sgr*max_sgr +
            b_min_pn*min_pn +
            b_max_pn*max_pn
        ),
        dims="obs"
    )
    likelihood = pm.Bernoulli("likelihood", p = p, observed = transferred, dims="obs")

The only piece that I am still struggling with is the variable “epsilon_a” which represents the unobserved variation between patients. I am unable to set the dims of “epsilon_a” to be the number of patients in the data set. I get broadcasting errors if I do.

jessegrabowski · April 18, 2024, 7:03pm

You will need to make a mapping from patients to rows of data. Assuming you have a column in your dataset like patient_id, you can do patient_idx, patients = pd.factorize(df.patient_id), then give epsilon_a the dimension patient and expand it to the number of rows with epsilon_a[patient_idx]

Dekermanjian · April 18, 2024, 7:08pm

@jessegrabowski Thank you very much! Your suggestion fixed the problems I was having with epsilon_a. Previously, when I was assigning epsilon_a dims of patients I was not indexing epsilon_a[patient_idx]. I apologize for this amateur mistake on my part. I do appreciate your help, again thank you!

jessegrabowski · April 18, 2024, 7:20pm

No worries, 99.9% of errors are shape errors. Don’t be shy about asking for help when you get stumped!

Dekermanjian · April 18, 2024, 7:28pm

Thanks man, honestly feeling real embarrassed by my mistake. I appreciate your help and kindness. Have a nice rest of your day!

Topic		Replies	Views
Multi-Multilevel Modeling Questions	9	2763	February 15, 2023
Bayesian multilevel model indexing problems Questions	1	514	June 4, 2021
Multilevel model: how to model outcome at the group level Questions	3	945	December 28, 2018
Hierarchical model with predictors with uncertainty at a different level than rest of data Questions	1	513	July 11, 2020
How to model a multivariate multilevel linear model Questions	6	644	September 11, 2019

How to properly index multilevel model with repeated measures data

Related topics