Hi all,
a colleague and I are trying to set up a GLMM in Bambi. In the past we have used explicit hierarchical models in PyMC but not worked much much GLMs. We came across a few questions and problems and would be grateful for any hints and ideas.
I’ll first describe the experimental design we’re trying to model. Unfortunately, it’s a bit more complex than most examples
The experiment has two within-participants experimental conditions and was conducted in two sessions (also within participants). A third factor ‘task’ with three levels was varied between participants. So it’s a “condition x session x task (between)” design with all factors being categorical. The response variable is a count. And we want ‘participant’ as random effect grouping variable.
We thought this translates to this model:
model = bmb.Model(
'''
p(correct, count) ~ session*condition*task
+ (session+task+condition|task/participant)
''',
data, family='binomial')
with session*condition * task
creating the main effects and interactions and + (session+task+condition|task/participant)
adding the random effects. session+task+condition
should only create random effects for the main effects as we think the data does not afford random interaction effects. What we are unsure about is the |task/participant
part. Since the task was varied between participants, we cannot simply have ‘participant’ as grouping. Here are some concrete questions:
-
is
|task/participant
the correct construct to implement task-specific random effects? The participant identifier is unique across the three tasks -
In the results we see posteriors for every participant in every task (also those in which they did not participated). Is this the model imputing the performance of participants in tasks they had not done, treating it like missing data?
-
Point (2) is not a problem but expected behavior, right?
-
The group-level effects generated by Bambi look like this:
Group-level effects 1|task ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 5.7505)) 1|task:participant ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 5.7505)) session|task ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 5.0)) session|task:participant ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 5.0)) task|task ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: [5.356 5.2789])) task|task:participant ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: [5.356 5.2789])) condition|task ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 5.0)) condition|task:participant ~ Normal(mu: 0.0, sigma: HalfNormal(sigma: 5.0))
If we get it correctly, the
1|
are the random intercepts and the other ones the slopes/effects relative to the reference category. Thex|task
andx|task:participant
seem to represent a task specific random effect plus a task–participant interaction random effect. I just can’t get my head around whether this does what we want (task-specific partial pooling via the participant grouping factor). And in particular, thetask|task
component looks weird. Is this all correct and can perhaps someone provide an intuitive explanation of what these components mean? -
To see whether the results made sense we tried to calculate the marginal effects of some factors by summing fixed effect components and then using an inverse logit to bring it back to the response scale. Some results seemed off, perhaps because we ignored the random effects. We then were happy to find the “plot_comparisons” and “plot_predictions” methods in Bambi. They work great for versions of the model without random effects. But as soon as we bring in the random effects, they fail with a key error ‘participant’. Is there anything to consider when using these functions with random effects models (or is this not possible yet)? Or is this related to the nested task/participant thing?
Any help with this would be highly appreciated! Bambi is such cool tool and I think for us it has a lot of potential for speeding up modelling in many scenarios.
Error caused by ‘plot_predictions’:
KeyError Traceback (most recent call last)
File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/pandas/core/indexes/base.py:3790, in Index.get_loc(self, key)
3789 try:
-> 3790 return self._engine.get_loc(casted_key)
3791 except KeyError as err:
File index.pyx:152, in pandas._libs.index.IndexEngine.get_loc()
File index.pyx:181, in pandas._libs.index.IndexEngine.get_loc()
File pandas/_libs/hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item()
File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'participant'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
Cell In[30], line 5
1 trace = az.from_netcdf('traces/DistractorInterference2.nc')
3 display(trace)
----> 5 bmb.interpret.plot_predictions(
6 model=model,
7 idata=trace,
8 covariates=["session", "condition", "task"],
9 pps=False,
10 legend=True,
11 # subplot_kwargs={"main": "hp", "group": "wt", "panel": "wt"},
12 fig_kwargs={"figsize": (20, 8), "sharey": True}
13 )
14 plt.tight_layout();
16 1/0
File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/bambi/interpret/plotting.py:168, in plot_predictions(model, idata, covariates, target, pps, use_hdi, prob, transforms, legend, ax, fig_kwargs, subplot_kwargs)
165 if transforms is None:
166 transforms = {}
--> 168 cap_data = predictions(
169 model,
170 idata,
171 covariates,
172 target=target,
173 pps=pps,
174 use_hdi=use_hdi,
175 prob=prob,
176 transforms=transforms,
177 )
179 response_name = get_aliased_name(model.response_component.response_term)
180 covariates = get_covariates(covariates)
File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/bambi/interpret/effects.py:503, in predictions(model, idata, covariates, target, pps, use_hdi, prob, transforms)
501 y_hat_mean = y_hat.mean(("chain", "draw"))
502 else:
--> 503 idata = model.predict(idata, data=cap_data, inplace=False)
504 y_hat = response_transform(idata.posterior[response.name_target])
505 y_hat_mean = y_hat.mean(("chain", "draw"))
File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/bambi/models.py:826, in Model.predict(self, idata, kind, data, inplace, include_group_specific, sample_new_groups)
823 else:
824 var_name = f"{response_aliased_name}_{name}"
--> 826 means_dict[var_name] = component.predict(
827 idata, data, include_group_specific, hsgp_dict, sample_new_groups
828 )
830 # Drop var/dim if already present. Needed for out-of-sample predictions.
831 if var_name in idata.posterior.data_vars:
File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/bambi/model_components.py:190, in DistributionalComponent.predict(self, idata, data, include_group_specific, hsgp_dict, sample_new_groups)
185 linear_predictor += self.predict_common(
186 posterior, data, in_sample, to_stack_dims, design_matrix_dims, hsgp_dict
187 )
189 if self.design.group and include_group_specific:
--> 190 linear_predictor += self.predict_group_specific(
191 posterior, data, in_sample, to_stack_dims, design_matrix_dims, sample_new_groups
192 )
194 # Sort dimensions
195 linear_predictor = linear_predictor.transpose(*linear_predictor_dims)
File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/bambi/model_components.py:318, in DistributionalComponent.predict_group_specific(self, posterior, data, in_sample, to_stack_dims, design_matrix_dims, sample_new_groups)
316 fm_eval_unseen_categories_original = fm.config["EVAL_UNSEEN_CATEGORIES"]
317 fm.config["EVAL_UNSEEN_CATEGORIES"] = "silent"
--> 318 group = self.design.group.evaluate_new_data(data)
319 fm.config["EVAL_UNSEEN_CATEGORIES"] = fm_eval_unseen_categories_original
321 Z = group.design_matrix
File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/formulae/matrices.py:405, in GroupEffectsMatrix.evaluate_new_data(self, data)
402 factors_with_new_levels = []
404 for term in self.terms.values():
--> 405 term_matrix = term.eval_new_data(data)
406 delta = term_matrix.shape[1] if term_matrix.ndim == 2 else 1
407 matrices_to_stack.append(term_matrix)
File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/formulae/terms/terms.py:700, in GroupSpecificTerm.eval_new_data(self, data)
683 """Evaluates the term with new data.
684
685 Converts the variable in ``factor`` to the type remembered from the first evaluation and
(...)
697 Zi: np.ndarray
698 """
699 Xi = self.expr.eval_new_data(data)
--> 700 Ji = self.factor.eval_new_data(data)
702 # If a row contains ALL zeroes, then it indicates that is a new, unseen, group.
703 all_zeros = ~Ji.any(axis=1)
File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/formulae/terms/terms.py:493, in Term.eval_new_data(self, data)
476 """Evaluates the term with new data.
477
478 Calls ``.eval_new_data()`` method on each component in the term and combines the results
(...)
489 The values resulting from evaluating this term using the new data.
490 """
491 if self.kind == "interaction":
492 result = reduce(
--> 493 get_interaction_matrix, [c.eval_new_data(data) for c in self.components]
494 )
495 else:
496 result = self.components[0].eval_new_data(data)
File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/formulae/terms/terms.py:493, in <listcomp>(.0)
476 """Evaluates the term with new data.
477
478 Calls ``.eval_new_data()`` method on each component in the term and combines the results
(...)
489 The values resulting from evaluating this term using the new data.
490 """
491 if self.kind == "interaction":
492 result = reduce(
--> 493 get_interaction_matrix, [c.eval_new_data(data) for c in self.components]
494 )
495 else:
496 result = self.components[0].eval_new_data(data)
File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/formulae/terms/variable.py:191, in Variable.eval_new_data(self, data_mask)
173 def eval_new_data(self, data_mask):
174 """Evaluates the variable with new data.
175
176 This method evaluates the variable within a new data mask. If this object is categorical,
(...)
189 categoric ones.
190 """
--> 191 x = data_mask[self.name]
192 if self.kind == "numeric":
193 return self.eval_new_data_numeric(x)
File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/pandas/core/frame.py:3896, in DataFrame.__getitem__(self, key)
3894 if self.columns.nlevels > 1:
3895 return self._getitem_multilevel(key)
-> 3896 indexer = self.columns.get_loc(key)
3897 if is_integer(indexer):
3898 indexer = [indexer]
File ~/.conda/envs/BambiPython_new/lib/python3.9/site-packages/pandas/core/indexes/base.py:3797, in Index.get_loc(self, key)
3792 if isinstance(casted_key, slice) or (
3793 isinstance(casted_key, abc.Iterable)
3794 and any(isinstance(x, slice) for x in casted_key)
3795 ):
3796 raise InvalidIndexError(key)
-> 3797 raise KeyError(key) from err
3798 except TypeError:
3799 # If we have a listlike key, _check_indexing_error will raise
3800 # InvalidIndexError. Otherwise we fall through and re-raise
3801 # the TypeError.
3802 self._check_indexing_error(key)
KeyError: 'participant'