# Syntax to mix continuous covariates with dummy categoricals

Hi <3 I am playing with the penguins dataset, which looks like this:

I have learned some syntax to estimate all three penguin species individually, like an ANOVA. Here, I use shape=3 to set 3 sigma priors and 3 mu priors (one for each of the three species). Then, I use [penguin_species.codes] to tell the likelihood which rows come from which species.

``````with pm.Model() as model_0:

# Priors of shape 3, because 3 species.
sigma = pm.HalfStudentT("sigma", 100, 2000, shape=3)
mu = pm.Normal("mu", 4000, 3000, shape=3)

# {y} is penguin mass.
y = pm.Normal(
"mass",
mu=mu[penguin_species.codes],
sigma=sigma[penguin_species.codes],
observed=penguins["body_mass_g"]
)

idata_0 = pm.sample()
idata_0.extend(pm.sample_prior_predictive(samples=5000))
pm.sample_posterior_predictive(idata_0, extend_inferencedata=True)
``````

This gives me six parameter distributions: 3 sigmas and 3 mus. Perfect.

But, when I want to extend to a proper linear regression with a continuous covariate, I donâ€™t know the best syntax to combine continuous covariates with categoricals. If I just want to do continuous covariates, I can construct the design matrix with B and X, then do pm.math.dot(B,X):

``````with pm.Model() as model_1:

# Set priors.
sigma = pm.HalfStudentT("sigma", 100, 2000)
B = pm.Normal("B", 0, 3000, shape=2)

# Define {mu} with the linear model.
mu = pm.math.dot(X, B)

# {y} is penguin mass.
y = pm.Normal(
"y",
mu=mu,
sigma=sigma,
observed=penguins["body_mass_g"]
)

idata_1 = pm.sample()
idata_1.extend(pm.sample_prior_predictive(samples=5000))
pm.sample_posterior_predictive(idata_1, extend_inferencedata=True)
``````

But, what if I want to combine the two methods and use BOTH continuous and categorical predictors? Do I have to manually construct a design matrix using something like pd.get_dummies, and then follow my 2nd method using pm.math.dot?

That seems like a lot of manual work, so I suspect there is an easier way???

Yes I think so.

Thatâ€™s pretty much the purpose of the Bambi library: BAyesian Model-Building Interface (Bambi) in Python â€” Bambi 0.12.0.dev documentation

I would say that for categorical covariates, itâ€™s better to index into a vector of variables instead of using a design matrix. See the multilevel modeling example for implementation details, or this Statistical Rethinking video for a higher level explanation.

Thanks, both. Gonna check out these links.

I was able to replicate both methods! I really love Bambi; hope to focus on that for a while.

But, I first wanted to make the same no-pooling model with both methods and check that they matched. Alas, Bambi really really really wants to do a partial-pooling model (Getting Started â€” Bambi 0.12.0.dev documentation). I canâ€™t figure out how to make Bambi do no pooling.

Seems like a new question so Iâ€™ll make another post!

2 Likes

Duh, itâ€™s literally so easy to do a no pooling model in Bambi that I didnâ€™t even think of it. Just do â€śy ~ x + non_pooled_groupâ€ť

1 Like