Levels with varying dimension sizes

[Edited to make example cleaner]

I am new to pymc, and trying something out with mock dataset. I have a model with nested predictors (mouse, neuron). The model below assumes the same number of neurons per mouse. In pymc, how does one handle the case where there are a different dimension sizes for levels of a predictor, in my case different different number of neurons per mouse?

with pm.Model() as model:
    intercept_m = pm.Normal('intercept_m', mu=0, sd=10, shape=n_mouse)
    
    sigma_n = pm.Exponential('sigma_n', 1)
    z_intercept_n = pm.Normal('z_intercept_n', mu=0, sd=1, shape=(n_neuron, n_mouse))
    intercept_n = pm.Deterministic('intercept_n', intercept_m + z_intercept_n * sigma_n)
    
    slope = pm.Normal('slope', mu=0, sd=10)
    
    theta = intercept_n[df.neuron, df.mouse] + slope * df.measure
    outcome = pm.Bernoulli("outcome", 
                           p=pm.invlogit(theta), 
                           observed=df.outcome)
1 Like

I found that I could do manual looping over the mouse, using a count of neurons per mouse. The solution below creates a Bernoulli variable for each mouse that is of a different length for each mouse (rather than an array of Bernoulli values, which must be non-jagged). From what I can tell in some quick experiments, the posteriors all look reasonable to me.

with pm.Model() as model:
    intercept = pm.Normal('intercept', mu=0, sd=10, shape=n_mouse)
    
    sigma = pm.Exponential('sigma_n', 1)
    slope = pm.Normal('slope', mu=0, sd=10)
    
    for midx, n_neuron in enumerate(n_neurons):
        z_intercept_n = pm.Normal(f'z_intercept_{midx}', mu=0, sd=1, shape=n_neuron)
        intercept_n = pm.Deterministic(f'intercept_{midx}', intercept[midx] + z_intercept_n * sigma)
        
        df_m = df[df['mouse'] == midx]
        theta = intercept_n[df_m.neuron] + slope * df_m.measure
        outcome = pm.Bernoulli(f"outcome_{midx}", 
                               p=pm.invlogit(theta), 
                               observed=df_m.outcome)
1 Like

Iā€™m assuming your dataset is in long form, one row per neuron.

Try changing the shape of z_intercept_n to be n_neuron:

z_intercept_n = pm.Normal('z_intercept_n', mu=0, sd=1, shape=n_neuron)

Then index into mice when defining intercept_n:

intercept_n = pm.Deterministic('intercept_n', intercept_m[df.mouse] + z_intercept_n * sigma_n)

Then
theta = intercept_n + slope * df.measure