When fitting a hierarchical model the Bambi model.predict() will not accept new data that is not from one of the existing groups, so it seems one cannot do an ‘out of sample’ prediction unless it is for one of the existing groups. I have just been doing it manually which is fine, but is there some way to do this that I am missing? posterior_predict
from rstanarm
does support this, for example.
At the moment Bambi does not allow making predictions using a new dataset that contains groups that were not observed in the training process.
This behavior is related to how our formula library works, which doesn’t allow us to derive a new design matrix when there are unseen levels. We could handle this in Bambi, and it’s something I’m interested in doing. However, I have other priorities at the moment so this will come after I finish with other changes.
Out of curiosity, how are you doing it manually?
I just use the posterior parameter draws and simulate the model. For the relatively simple models I am using this is pretty straightforward. For example, if I were doing linear regression with random intercepts, y ~ x + (1|group)
, the posterior would have draws for Intercept
, x
, 1|group
, 1|group_sigma
and y_sigma
.
I can use these directly to generate predictions for a single member of a new group. I use draws for any of the groups to do this since i will not be using the 1|group
. Something like this (for a new member with x = 4:
# Convert to dataframe, pick a group, any group. Wont be using group specific effect.
samples = results.posterior.to_dataframe().query('group__factor_dim == "A"').reset_index()
sample['y_pred'] = samples['Intercept'] +samples['x'] * 4 + \
np.random.normal(scale = samples['1|group_sigma']) + \
np.random.normal(scale = samples['y_sigma'])
# Yes i know I could combine these calls to np.random.normal.