Understanding multilevel model coefficients

I’m having difficulty understanding what the coefficients in a multilevel model mean.

Take this model that I’m stuck on from McElreath’s Statistical Rethinking in Section 13.31 (cell 38). The jupyter notebook is here.

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm

d = pd.read_csv("Data/chimpanzees.csv", sep=";")

treatment = (d.prosoc_left + 2 * d.condition).values
Ntreatments = len(np.unique(treatment))

actor = (d.actor - 1).astype(int).values
Nactor = len(np.unique(actor))

block = (d.block - 1).astype(int).values
Nblock = len(np.unique(block))

with pm.Model() as m_13_4:
    a_bar = pm.Normal("a_bar", 0.0, 1.5)
    sigma_a = pm.Exponential("sigma_a", 1.0)
    sigma_g = pm.Exponential("sigma_g", 1.0)

    a = pm.Normal("a", a_bar, sigma_a, shape=Nactor)
    g = pm.Normal("g", 0.0, sigma_g, shape=Nblock)
    b = pm.Normal("b", 0.0, 0.5, shape=Ntreatments)

    actor_ = pm.Data("actor", actor)
    block_ = pm.Data("block", block)
    treatment_ = pm.Data("treatment", treatment)
    #data = pm.Data("data", d, mutable=True)
    p = pm.Deterministic("p", pm.math.invlogit(a[actor_] + g[block_] + b[treatment_]))
    pulled_left = pm.Binomial("pulled_left", 1, p, observed=d.pulled_left.values)

    trace_13_4 = pm.sample(tune=3000, target_accept=0.99, random_seed=RANDOM_SEED)

az.summary(trace_13_4)

My questions are:

  1. I understand that each coefficient is the additive effect on the log-odds of pulling the left level. However, when we look at the coefficient for a particular level, say a[0], do we ignore the a_bar coefficient? Or do we actually add the a_bar and a[0] coefficients to determine the effect?
  2. It appears that specifying p = pm.Deterministic outputs an intercept for every single row in the data. We can build the model without the pm.Deterministic. So what’s the reason for using it?
  3. How do I interpret the p coefficients in relation to the coefficients of the other variables?
  1. a[0] is the parameter associated with the first actor. a_bar is a separate parameter that is used to specify the prior on the mean of a[0] (and a[1], etc.). Parameters like a_bar (and sigma_a, etc.) are sometimes referred to as hyperparameters. So the influence of a_bar on a[0] is already taken care of during model specification and the posterior samples you see of a[0] can be interpreted without adding a_bar or anything like that.

  2. Creating a pm.Deterministic saving the values into the trace. If you don’t want to save them, then you just do:

pulled_left = pm.Binomial("pulled_left",
                          1,
                          pm.math.invlogit(a[actor_] + 
                                           g[block_] + 
                                           b[treatment_]
                          ), 
                          observed=d.pulled_left.values
)
  1. If you are unfamiliar with interpreting coefficients in GLMs, I would poke around for something like this or this until you find one that makes sense to you.

Thanks for the guidance. Clear on point 1.

Regarding point 2, what is the purpose of saving the values to a trace?
Regarding point 3, I understand the additive effect of the coefficients on the log-odds of the response variable. What I meant by my question is, if we specify pm.Deterministic, then PyMC will creates a separate coefficient for every row in the data.
(a). What does this coefficient mean in relation to the other variables (in this case the actor, block and treatment)? I suppose that if we’re looking at the coefficient for a particular row, then all the other coefficients don’t come into play.
(b). Why would we want this, it seems like overfitting?

There are lots of different reasons. Maybe you want to inspect (e.g., plot) many of the quantities in your posterior, including things like p. If p isn’t saved in your trace, then you have to calculate it yourself (using the other parameters in the trace) and then plot it yourself.

Can we see how you are inspecting your data? It’s tough to know what “a row” is without knowing how you are looking at your posterior.

Sure, I’m asking about the coefficients of p:


I understand the coefficient for a is for the individual (aka actor), b is for the treatment and g is for the block, but p I’m not following.

Do you have 504 observations? I think each of those p values corresponds to a single observation (and results from the corresponding combination of those 504 combinations of actor, block, and treatment).

Yes, there are 504 observations and every single datapoint gets an intercept. I understand what you’re saying and it does seem like it’s that because some of the values for different datapoints are the same (so if the actor, treatment and block are the same). But I can’t figure out how to calculate p from the coefficients of the other variables. For example:

a[0] = -0.368
g[0] = -0.165
b[0] = -0.133

Sum is -0.666, But I don’t see that value in the list of any of the intercepts for p.

Or is the p calculation totally separate and has no relation to the a, g, b intercepts?

We can follow your model code to see what we should expect.

The sum you calculate is for actor=0, block=0, and treatment=0. However, it’s not clear if that combination corresponds to any of your observations (and if it does, it’s not clear which observation(s) it corresponds to). If we want to see the p calculated for the first observation (i.e., d.pulled_left.values[0]), then we need to figure out what actor/block/treatment is associated with that first observation. The combination should be actor_[0] and block_[0] and treatment_[0]. Thus, the sum should be:

a[actor_[0]] + g[block_[0]] + b[treatment_[0]]

Your model immediately transforms this sum using pm.math.invlogit(). So p[0] should be

p[0] = 1 / (1 + exp(a[actor_[0]] + g[block_[0]] + b[treatment_[0]]))

Or, you can just calculate the values for all observations all at once:

p = 1 / (1 + exp(a[actor_] + g[block_] + b[treatment_]))

and check the value associated with the first observation, p[0].

Thanks.

I used an equivalent formulation of the inverse logit:

p =  exp(a[actor_] + g[block_] + b[treatment_]) / (1 + exp(a[actor_] + g[block_] + b[treatment_]))

Numbers almost work when I plug them in like this:

az_res = az.summary(trace_13_4)
# treatment for 0th datapoint is 0
exp_res = np.exp(az_res.loc["a[0]", "mean"] + az_res.loc["b[0]", "mean"] + az_res.loc["g[0]", "mean"])
p0_res = exp_res / (1+exp_res)

p0_res is 0.339 while az_res.loc[“p[0]”, “mean”] is 0.344. So quite close but not exact. What could be the cause of the difference?

Hi Matsuo,

I think they are two different means. They should usually be close but you shouldn’t expect them to always match.

The az_res.loc[“p[0]”, “mean”] is the mean of the mcmc samples. As the model samples parameters, it pushes those samples through the link function to get samples of p. Then we can just summarize the distribution of those samples with a mean (or an HDI, if you like).

For p0_res, you averaged the samples for each parameter and then pushed those averages through to get the answer. So p0_res shouldn’t have an HDI because it’s not a distribution.

One way the averages can come apart is if there are a few extreme or unusual samples sitting around in the posterior distribution for a or b or g. in the az_res.loc[“p[0]”, “mean”] case, those extreme values get pushed through to make more extreme values in p. But when you average earlier on in the process with p0_res, odd features of the posterior distribution can disappear. Anyway, I think it’s better to push the whole distribution of a, b, & g through to find out about the behavior of p.

2 Likes