Highly correlated variables

leomein · November 22, 2022, 2:13pm

Hi all,

I’m not sure if this title is fitting but I don’t know how else to concisely describe the issue. I’m using Bambi but I guess that’s not really relevant here.

I have a model with a continuous outcome y and two categorical input variables x1 and x2. x1 has two categories, a and b, and x2 also has two categories, 0 and 1. I want to model an interaction effect between x1 and x2.

So the model formula would be y ~ x1 + x2:x1

The problem is that all data points in category a of variable x1 have a 0 value in variable x2. For category b there are both 0 and 1 values. I am not interested in getting an estimate for the x2 parameter in the case of x1=a, but in theory these data could exist. I started out with a hierarchical model:

y ~ x1 + x2|x1

which kinda worked, the problem is that I’m getting very wide HDIs for both the x1 and the x2 parameter in the case of x1=a (I don’t really care about the x2 parameter here though). So the next thing I did was using the first formula and setting a very narrow prior for x2 when x1=a and a flatter prior for x2 when x1=b. This seemed to have worked, i.e. the x1 parameter for when x1=a does not have a huge HDI anymore, but I’m wondering if this is a legitimate approach or if there’s a better way to deal with such a situation?

Hope things are clear, let me know if not. Thanks!

ckrapu · November 23, 2022, 11:51pm

What do the estimates for the x2 parameter look like after using a narrow prior? Are the HDIs smaller and centered away from zero?

tcapretto · November 24, 2022, 12:21pm

I’m not sure I understand the problem completely. Perhaps it would be better if you could share a reproducible example. But what happens if you try the following model?

y ~ 0 + x1:x2

Edit: See this example

import arviz as az
import bambi as bmb
import numpy as np
import pandas as pd

data = pd.DataFrame(
    {
        "y": np.random.normal(size=8),
        "x1": ["a"] * 4 + ["b"] * 4,
        "x2": ["0"] * 4 + ["0", "1"] * 2
    }
)


priors = {"x1:x2": bmb.Prior("Normal", mu=0, sigma=1)}
model = bmb.Model("y ~ 0 + x1:x2", data, priors=priors)
idata = model.fit()
az.summary(idata, kind="stats")

az.plot_trace(idata, backend_kwargs={"tight_layout": True});

leomein · January 3, 2023, 1:40pm

Hi both, thanks for your answers, I hadn’t realized that there were replies… I think the given solution will work for me, I somehow assumed that the model would crash when trying to estimate x1:x2[a, 1] as there are no data, so I didn’t even try it like that. Thanks!

Topic		Replies	Views
Fitting with flat priors in bambi v5 bambi	1	34	May 4, 2025
Random variable transformations and Bambi version agnostic bambi	10	107	March 10, 2025
Can't fit binomial hierarchical model on proportion data version agnostic bambi , modeling	3	372	April 11, 2023
Unexpected behavior with arviz.plot_hdi() with categorical x bambi , arviz	3	93	January 22, 2025
Sensible prior for logistic regression with one or no input variable Questions	2	456	December 5, 2021

Highly correlated variables

Related topics