Questions concerning random effects model in Bambi

Hi all,

a colleague and I are trying to set up a GLMM in Bambi. In the past we have used explicit hierarchical models in PyMC but not worked much much GLMs. We came across a few questions and problems and would be grateful for any hints and ideas.

I’ll first describe the experimental design we’re trying to model. Unfortunately, it’s a bit more complex than most examples :sweat_smile:

The experiment has two within-participants experimental conditions and was conducted in two sessions (also within participants). A third factor ‘task’ with three levels was varied between participants. So it’s a “condition x session x task (between)” design with all factors being categorical. The response variable is a count. And we want ‘participant’ as random effect grouping variable.

We thought this translates to this model:

model = bmb.Model(
    '''
    p(correct, count) ~ session*condition*task 
                        + (session+task+condition|task/participant)
    ''',
    data, family='binomial')

with session*condition * task creating the main effects and interactions and + (session+task+condition|task/participant) adding the random effects. session+task+condition should only create random effects for the main effects as we think the data does not afford random interaction effects. What we are unsure about is the |task/participant part. Since the task was varied between participants, we cannot simply have ‘participant’ as grouping. Here are some concrete questions:

  1. is |task/participant the correct construct to implement task-specific random effects? The participant identifier is unique across the three tasks

  2. In the results we see posteriors for every participant in every task (also those in which they did not participated). Is this the model imputing the performance of participants in tasks they had not done, treating it like missing data?

  3. Point (2) is not a problem but expected behavior, right?

  4. The group-level effects generated by Bambi look like this:

     Group-level effects
         1|task ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 5.7505))
         1|task:participant ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 5.7505))
         session|task ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 5.0))
         session|task:participant ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 5.0))
         task|task ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: [5.356  5.2789]))
         task|task:participant ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: [5.356  5.2789]))
         condition|task ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 5.0))
         condition|task:participant ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 5.0))
    

    If we get it correctly, the 1|are the random intercepts and the other ones the slopes/effects relative to the reference category. The x|task and x|task:participantseem to represent a task specific random effect plus a task–participant interaction random effect. I just can’t get my head around whether this does what we want (task-specific partial pooling via the participant grouping factor). And in particular, the task|taskcomponent looks weird. Is this all correct and can perhaps someone provide an intuitive explanation of what these components mean?

  5. To see whether the results made sense we tried to calculate the marginal effects of some factors by summing fixed effect components and then using an inverse logit to bring it back to the response scale. Some results seemed off, perhaps because we ignored the random effects. We then were happy to find the “plot_comparisons” and “plot_predictions” methods in Bambi. They work great for versions of the model without random effects. But as soon as we bring in the random effects, they fail with a key error ‘participant’. Is there anything to consider when using these functions with random effects models (or is this not possible yet)? Or is this related to the nested task/participant thing?

Any help with this would be highly appreciated! Bambi is such cool tool and I think for us it has a lot of potential for speeding up modelling in many scenarios.


Error caused by ‘plot_predictions’:

KeyError                                  Traceback (most recent call last)
File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/pandas/core/indexes/base.py:3790, in Index.get_loc(self, key)
   3789 try:
-> 3790     return self._engine.get_loc(casted_key)
   3791 except KeyError as err:

File index.pyx:152, in pandas._libs.index.IndexEngine.get_loc()

File index.pyx:181, in pandas._libs.index.IndexEngine.get_loc()

File pandas/_libs/hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()

File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'participant'

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[30], line 5
      1 trace = az.from_netcdf('traces/DistractorInterference2.nc')
      3 display(trace)
----> 5 bmb.interpret.plot_predictions(
      6     model=model, 
      7     idata=trace, 
      8     covariates=["session", "condition", "task"],
      9     pps=False,
     10     legend=True,
     11 #   subplot_kwargs={"main": "hp", "group": "wt", "panel": "wt"},
     12     fig_kwargs={"figsize": (20, 8), "sharey": True}
     13 )
     14 plt.tight_layout();
     16 1/0

File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/bambi/interpret/plotting.py:168, in plot_predictions(model, idata, covariates, target, pps, use_hdi, prob, transforms, legend, ax, fig_kwargs, subplot_kwargs)
    165 if transforms is None:
    166     transforms = {}
--> 168 cap_data = predictions(
    169     model,
    170     idata,
    171     covariates,
    172     target=target,
    173     pps=pps,
    174     use_hdi=use_hdi,
    175     prob=prob,
    176     transforms=transforms,
    177 )
    179 response_name = get_aliased_name(model.response_component.response_term)
    180 covariates = get_covariates(covariates)

File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/bambi/interpret/effects.py:503, in predictions(model, idata, covariates, target, pps, use_hdi, prob, transforms)
    501     y_hat_mean = y_hat.mean(("chain", "draw"))
    502 else:
--> 503     idata = model.predict(idata, data=cap_data, inplace=False)
    504     y_hat = response_transform(idata.posterior[response.name_target])
    505     y_hat_mean = y_hat.mean(("chain", "draw"))

File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/bambi/models.py:826, in Model.predict(self, idata, kind, data, inplace, include_group_specific, sample_new_groups)
    823     else:
    824         var_name = f"{response_aliased_name}_{name}"
--> 826 means_dict[var_name] = component.predict(
    827     idata, data, include_group_specific, hsgp_dict, sample_new_groups
    828 )
    830 # Drop var/dim if already present. Needed for out-of-sample predictions.
    831 if var_name in idata.posterior.data_vars:

File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/bambi/model_components.py:190, in DistributionalComponent.predict(self, idata, data, include_group_specific, hsgp_dict, sample_new_groups)
    185     linear_predictor += self.predict_common(
    186         posterior, data, in_sample, to_stack_dims, design_matrix_dims, hsgp_dict
    187     )
    189 if self.design.group and include_group_specific:
--> 190     linear_predictor += self.predict_group_specific(
    191         posterior, data, in_sample, to_stack_dims, design_matrix_dims, sample_new_groups
    192     )
    194 # Sort dimensions
    195 linear_predictor = linear_predictor.transpose(*linear_predictor_dims)

File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/bambi/model_components.py:318, in DistributionalComponent.predict_group_specific(self, posterior, data, in_sample, to_stack_dims, design_matrix_dims, sample_new_groups)
    316 fm_eval_unseen_categories_original = fm.config["EVAL_UNSEEN_CATEGORIES"]
    317 fm.config["EVAL_UNSEEN_CATEGORIES"] = "silent"
--> 318 group = self.design.group.evaluate_new_data(data)
    319 fm.config["EVAL_UNSEEN_CATEGORIES"] = fm_eval_unseen_categories_original
    321 Z = group.design_matrix

File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/formulae/matrices.py:405, in GroupEffectsMatrix.evaluate_new_data(self, data)
    402 factors_with_new_levels = []
    404 for term in self.terms.values():
--> 405     term_matrix = term.eval_new_data(data)
    406     delta = term_matrix.shape[1] if term_matrix.ndim == 2 else 1
    407     matrices_to_stack.append(term_matrix)

File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/formulae/terms/terms.py:700, in GroupSpecificTerm.eval_new_data(self, data)
    683 """Evaluates the term with new data.
    684 
    685 Converts the variable in ``factor`` to the type remembered from the first evaluation and
   (...)
    697 Zi: np.ndarray
    698 """
    699 Xi = self.expr.eval_new_data(data)
--> 700 Ji = self.factor.eval_new_data(data)
    702 # If a row contains ALL zeroes, then it indicates that is a new, unseen, group.
    703 all_zeros = ~Ji.any(axis=1)

File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/formulae/terms/terms.py:493, in Term.eval_new_data(self, data)
    476 """Evaluates the term with new data.
    477 
    478 Calls ``.eval_new_data()`` method on each component in the term and combines the results
   (...)
    489     The values resulting from evaluating this term using the new data.
    490 """
    491 if self.kind == "interaction":
    492     result = reduce(
--> 493         get_interaction_matrix, [c.eval_new_data(data) for c in self.components]
    494     )
    495 else:
    496     result = self.components[0].eval_new_data(data)

File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/formulae/terms/terms.py:493, in <listcomp>(.0)
    476 """Evaluates the term with new data.
    477 
    478 Calls ``.eval_new_data()`` method on each component in the term and combines the results
   (...)
    489     The values resulting from evaluating this term using the new data.
    490 """
    491 if self.kind == "interaction":
    492     result = reduce(
--> 493         get_interaction_matrix, [c.eval_new_data(data) for c in self.components]
    494     )
    495 else:
    496     result = self.components[0].eval_new_data(data)

File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/formulae/terms/variable.py:191, in Variable.eval_new_data(self, data_mask)
    173 def eval_new_data(self, data_mask):
    174     """Evaluates the variable with new data.
    175 
    176     This method evaluates the variable within a new data mask. If this object is categorical,
   (...)
    189         categoric ones.
    190     """
--> 191     x = data_mask[self.name]
    192     if self.kind == "numeric":
    193         return self.eval_new_data_numeric(x)

File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/pandas/core/frame.py:3896, in DataFrame.__getitem__(self, key)
   3894 if self.columns.nlevels > 1:
   3895     return self._getitem_multilevel(key)
-> 3896 indexer = self.columns.get_loc(key)
   3897 if is_integer(indexer):
   3898     indexer = [indexer]

File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/pandas/core/indexes/base.py:3797, in Index.get_loc(self, key)
   3792     if isinstance(casted_key, slice) or (
   3793         isinstance(casted_key, abc.Iterable)
   3794         and any(isinstance(x, slice) for x in casted_key)
   3795     ):
   3796         raise InvalidIndexError(key)
-> 3797     raise KeyError(key) from err
   3798 except TypeError:
   3799     # If we have a listlike key, _check_indexing_error will raise
   3800     #  InvalidIndexError. Otherwise we fall through and re-raise
   3801     #  the TypeError.
   3802     self._check_indexing_error(key)

KeyError: 'participant'

Hello!

I need more time to go through the problem. But regarding the last part, it’s something that should be fixed in the development version of Bambi. Can you install from the main branch on GitHub and try it again?

Note: It’s possible it also installs numpy==1.26.0 which may still cause some import problems with pytensor, so if you see that problem just install an older version of numpy (but not too old!)

Edit the only thing I can add now quickly is that variable + (variable | variable) doesn’t make a lot of sense, this is what you’re doing with task. Would you be able to share (a simulated version of) your data? It would help to see how things are coded.

@tcapretto, Thanks for the super quick reply and the suggestion. We installed from git about a week ago and now I just reinstalled the newest version (0.13.0.dev according to version). It still shows the same error. I did not resample the model, however … just loaded the old trace. Should we resample?

Would you be able to share (a simulated version of) your data?
Yes, that should be possible. But we need a some time for that.

Concerning:
variable + (variable | variable)

I think what we want to do is:
task + (task | participant(from within that particular task))

Edit: Hmm I think you convinced me that task should not be on the left side of the ( | ) term.

@tcapretto, I just simulated some data and wanted to see if it runs through, but I now got a numpy related error. Then I tried with the original data and got the error again. It seems to be related to upgrading Bambi to 0.13.0.dev. I tried upgrading numpy (we were on 1.25.2) to to 1.26.0 but then got the pytensor errors you anticipated. There is no in-between numpy version that works. Here is the stack trace from the error that occurs when using Bamvi 0.13.0.dev with numpy 1.25.2:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[2], line 51
     47 display(new_df)
     48 new_df.to_csv('simulated_data.csv')
---> 51 model = bmb.Model(
     52     '''p(correct, count) ~ 
     53        session*condition*task + (session+condition|task/participant)
     54     ''',
     55     new_df, family='binomial')

File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/bambi/models.py:162, in Model.__init__(self, formula, data, family, priors, link, categorical, potentials, dropna, auto_scale, noncentered, center_predictors, extra_namespace)
    160     design = remove_common_intercept(design)
    161 else:
--> 162     design = fm.design_matrices(
    163         self.formula.main, self.data, na_action, 1, additional_namespace
    164     )
    166 if design.response is None:
    167     raise ValueError(
    168         "No outcome variable is set! "
    169         "Please specify an outcome variable using the formula interface."
    170     )

File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/formulae/matrices.py:556, in design_matrices(formula, data, na_action, env, extra_namespace)
    553     else:
    554         raise ValueError(f"'data' contains {incomplete_rows_n} incomplete rows.")
--> 556 design = DesignMatrices(description, data, env)
    557 return design

File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/formulae/matrices.py:58, in DesignMatrices.__init__(self, model, data, env)
     56 if self.model.response:
     57     self.response = ResponseMatrix(self.model.response)
---> 58     self.response.evaluate(data, env)
     60 if self.model.common_terms:
     61     self.common = CommonEffectsMatrix(self.model.common_terms)

File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/formulae/matrices.py:137, in ResponseMatrix.evaluate(self, data, env)
    135 self.data = data
    136 self.env = env
--> 137 self.term.set_type(self.data, self.env)
    138 self.term.set_data()
    139 self.kind = self.term.term.kind

File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/formulae/terms/terms.py:820, in Response.set_type(self, data, env)
    818 def set_type(self, data, env):
    819     """Set type of the response term."""
--> 820     self.term.set_type(data, env)

File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/formulae/terms/terms.py:433, in Term.set_type(self, data, env)
    431     component.set_type(data)
    432 elif isinstance(component, Call):
--> 433     component.set_type(data, env)
    434 else:
    435     raise ValueError(
    436         "Can't set type on Term because at least one of the components "
    437         f"is of the unexpected type {type(component)}."
    438     )

File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/formulae/terms/call.py:104, in Call.set_type(self, data_mask, env)
     88 """Evaluates function and determines the type of the result of the call.
     89 
     90 Evaluates the function call and sets the ``.kind`` property to ``"numeric"`` or
   (...)
    100     The environment where values and functions are taken from.
    101 """
    103 self.env = env.with_outer_namespace({**TRANSFORMS, **ENCODINGS})
--> 104 x = self.call.eval(data_mask, self.env)
    106 if is_numeric_dtype(x):
    107     self.kind = "numeric"

File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/formulae/terms/call_resolver.py:272, in LazyCall.eval(self, data_mask, env)
    269 args = [arg.eval(data_mask, env) for arg in self.args]
    270 kwargs = {name: arg.eval(data_mask, env) for name, arg in self.kwargs.items()}
--> 272 return callee(*args, **kwargs)

TypeError: 'numpy.int64' object is not callable

@tcapretto, the error above was unrelated to the update of Bambi. Does Bambi perhaps access local variables? I have the feeling the presence of a local variable with the same name as a viable in the model string might have cause the problem.

I’m attaching a file with simulated data now. The structure is very similar to our real data, but there are less individuals and one level less in one factor to facilitate computation.

This is the current model (updated to some label changes)

    p(correct, count) ~ session*condition*task + (session+condition|task/individual)

We also discussed the model with colleagues and some suggest it should rather be:

    p(correct, count) ~ session*condition*task + (session+condition|individual)

I feel this neglects the fact that there could be task-dependence of the ‘individual’ group-level effect. But I might be mistaken. (So this comes back to question 1 in my original post).

Many thanks again for your help!

simulated_data(1).csv (3.3 KB)

I guess this is what you want

model = bmb.Model(
    "p(correct, count) ~ 1 + session*condition + task + (1 + session*condition | individual)",
    data,
    family="binomial",
    categorical=["session", "individual"],
)

The following is the same model, but I like the parametrization a little bit more

model = bmb.Model(
    "p(correct, count) ~ 0 + session:condition + task + (0 + session:condition | individual)",
    data,
    family="binomial",
    categorical=["session", "individual"],
)

edit you may want to use better priors than the defaults, a larger target_accept value, and a centered parametrization. I’ve been playing with it and i got the best results in terms of ESS and r-hat this way.


priors = {
    "session:condition": bmb.Prior("Normal", mu=0, sigma=1),
    "task": bmb.Prior("Normal", mu=0, sigma=1),
    "session:condition|individual": bmb.Prior("Normal", mu=0, sigma=bmb.Prior("Gamma", alpha=3, beta=3))
}
model = bmb.Model(
    "p(correct, count) ~ 0 + session:condition + task + (0 + session:condition | individual)",
    data,
    family="binomial",
    categorical=["session", "individual"],
    priors=priors,
    noncentered=False
)

idata = model.fit(target_accept=0.9)

many thanks!

May I ask two questions:

  • Did you remove the global intercept because the task component captures the (task specific) intercepts? I see you have not set task to categorical, so it will not be dummy coded and there will be three intercepts for it?

  • Why did you remove the session and condition slopes and only kept the interaction?

@tcapretto,

By now we have tested your suggestion and further variants. The random effects component in the way you suggested it definitely improves the predictions (better scores in model comparison).

However, we remain unsure about the common effects part. One point is that we are interested in the interaction with task, so we have added task interactions and got best scores with only including the session:condition:task threeway interaction but not the lower order terms. In your suggestion you had also removed the lower order terms in the common effects part (i.e. the main effects of session and condition). So perhaps there is some rationale (with the suggested parametrization) to just include the highest order effects of interest?

So if you could elaborate a bit why you modeled it like suggest that would be immensely helpful!

Many thanks!

Sorry for the slow response!

  1. I removed the intercept to make the session:condition interaction be cell-means encoded (each coefficient represents the mean in the group given by combination of session and condition). task will be encoded as categorical by default because it’s not numeric in the data frame.
  2. The models 1 + session + condition + session:condition and 0 + session:condition use the same information and are equivalent. I prefer the second one to have parameters that can be interpreted with more ease.

Honestly, I don’t remember why I removed the interaction with task. Maybe it’s because each individual performs only a single task? It may make sense to have this

0 + session:condition:task + (0 + session:condition | individual) but I’m not so sure about 0 + session:condition:task + (0 + session:condition:task | individual) as each individual is associated with a single task, correct?

 data.groupby(["individual", "task"]).size()
individual  task 
0           mixed    4
1           mixed    4
2           mixed    4
3           mixed    4
4           mixed    4
5           mixed    4
6           mixed    4
7           mixed    4
8           mixed    4
9           gray     4
10          gray     4
11          gray     4
12          gray     4
13          gray     4
14          gray     4
15          gray     4
16          gray     4
17          gray     4
18          color    4
19          color    4
20          color    4
21          color    4
22          color    4
23          color    4
24          color    4
25          color    4
26          color    4
dtype: int64

Anyway, it’s been a while, so at this point you may already have the answer to your question and I don’t remember all the details.