Highly correlated variables

Hi all,

I’m not sure if this title is fitting but I don’t know how else to concisely describe the issue. I’m using Bambi but I guess that’s not really relevant here.

I have a model with a continuous outcome y and two categorical input variables x1 and x2. x1 has two categories, a and b, and x2 also has two categories, 0 and 1. I want to model an interaction effect between x1 and x2.

So the model formula would be y ~ x1 + x2:x1

The problem is that all data points in category a of variable x1 have a 0 value in variable x2. For category b there are both 0 and 1 values. I am not interested in getting an estimate for the x2 parameter in the case of x1=a, but in theory these data could exist. I started out with a hierarchical model:

y ~ x1 + x2|x1

which kinda worked, the problem is that I’m getting very wide HDIs for both the x1 and the x2 parameter in the case of x1=a (I don’t really care about the x2 parameter here though). So the next thing I did was using the first formula and setting a very narrow prior for x2 when x1=a and a flatter prior for x2 when x1=b. This seemed to have worked, i.e. the x1 parameter for when x1=a does not have a huge HDI anymore, but I’m wondering if this is a legitimate approach or if there’s a better way to deal with such a situation?

Hope things are clear, let me know if not. Thanks!

What do the estimates for the x2 parameter look like after using a narrow prior? Are the HDIs smaller and centered away from zero?

I’m not sure I understand the problem completely. Perhaps it would be better if you could share a reproducible example. But what happens if you try the following model?

y ~ 0 + x1:x2

Edit: See this example

import arviz as az
import bambi as bmb
import numpy as np
import pandas as pd

data = pd.DataFrame(
        "y": np.random.normal(size=8),
        "x1": ["a"] * 4 + ["b"] * 4,
        "x2": ["0"] * 4 + ["0", "1"] * 2

priors = {"x1:x2": bmb.Prior("Normal", mu=0, sigma=1)}
model = bmb.Model("y ~ 0 + x1:x2", data, priors=priors)
idata = model.fit()
az.summary(idata, kind="stats")


az.plot_trace(idata, backend_kwargs={"tight_layout": True});

1 Like

Hi both, thanks for your answers, I hadn’t realized that there were replies… I think the given solution will work for me, I somehow assumed that the model would crash when trying to estimate x1:x2[a, 1] as there are no data, so I didn’t even try it like that. Thanks!

1 Like