How to pass a coordinate to inference data

I’ve been skimming through Arviz documentation and came across 8 school inference data.

import arviz as az
idata = az.load_arviz_data("centered_eight")

Inference data object also includes school coordinate. How can I do that with my bambi models? I always have a dimension column that looks like ..._dim_....
Thanks!

P.S. If you want to earn a few points, you can answer it on stackoverflow.

2 Likes

Welcome!

There are definitely resources for how to add them to pymc models (e.g., here and here), but I am unsure about bambi. @tcapretto ?

1 Like

Dimensions are added automatically by Bambi depending on the levels of the variables in the model. Their names are “{variable_name}_dim” as you correctly point out.

I’m not sure I understand your problem though. Do you have an InferenceData object and you want to modify its dimensions in some way? If that’s the case, it’s more ArviZ/xarray related than Bambi related, but I’m still happy to help if you have an exmaple.

You’re correct. However, it’s way less handy to make plots (and joins) if you don’t have informative dimensions. For example, with this simple example:

import bambi as bmb
import pandas as pd
import arviz as az

df_simple = pd.DataFrame({
    'x': ['A', 'B', 'C'],
    'y': [10, 20, 30],
    'n': [100, 100, 100]
})

m = bmb.Model('p(y, n) ~ 0 + x', data=df_simple, family='binomial')
idata = m.fit(cores=4)

m.predict(idata)

az.plot_forest(idata, var_names='p(y, n)_mean', combined=True)

The plot has numbers instead of group names on y axis. When looking at the documentation, forest plot has correct y-axis labels right out the box. What am I doing wrong / how can I achieve that (without the need to manually rename the axis labels, of course)?

Got it!

If you have a look at idata.posterior you will see something like the following

Notice the coordinates for p(y, n)_mean is p(y, n)_obs, which is just the row number. This is because Bambi doesn’t know each row is paired with one x_dim (A, B, and C). You can modify the plot after creation though. ArviZ functions usually return an array of matplotlib Axes.

axes = az.plot_forest(idata, var_names='p(y, n)_mean', combined=True)
axes[0].set(yticklabels=["C", "B", "A"], ylabel="Parameter name", xlabel="Posterior distribution");

image

EDIT Another option is to modify the coords in the xarray.Dataset

idata.posterior = idata.posterior.assign_coords({"p(y, n)_obs": ["A", "B", "C"]})
az.plot_forest(idata, var_names='p(y, n)_mean', combined=True);

image