Hi <3 I am playing with the penguins dataset, which looks like this:
I have learned some syntax to estimate all three penguin species individually, like an ANOVA. Here, I use shape=3 to set 3 sigma priors and 3 mu priors (one for each of the three species). Then, I use [penguin_species.codes] to tell the likelihood which rows come from which species.
with pm.Model() as model_0:
# Priors of shape 3, because 3 species.
sigma = pm.HalfStudentT("sigma", 100, 2000, shape=3)
mu = pm.Normal("mu", 4000, 3000, shape=3)
# {y} is penguin mass.
y = pm.Normal(
"mass",
mu=mu[penguin_species.codes],
sigma=sigma[penguin_species.codes],
observed=penguins["body_mass_g"]
)
idata_0 = pm.sample()
idata_0.extend(pm.sample_prior_predictive(samples=5000))
pm.sample_posterior_predictive(idata_0, extend_inferencedata=True)
This gives me six parameter distributions: 3 sigmas and 3 mus. Perfect.
But, when I want to extend to a proper linear regression with a continuous covariate, I don’t know the best syntax to combine continuous covariates with categoricals. If I just want to do continuous covariates, I can construct the design matrix with B and X, then do pm.math.dot(B,X):
with pm.Model() as model_1:
# Set priors.
sigma = pm.HalfStudentT("sigma", 100, 2000)
B = pm.Normal("B", 0, 3000, shape=2)
# Define {mu} with the linear model.
mu = pm.math.dot(X, B)
# {y} is penguin mass.
y = pm.Normal(
"y",
mu=mu,
sigma=sigma,
observed=penguins["body_mass_g"]
)
idata_1 = pm.sample()
idata_1.extend(pm.sample_prior_predictive(samples=5000))
pm.sample_posterior_predictive(idata_1, extend_inferencedata=True)
But, what if I want to combine the two methods and use BOTH continuous and categorical predictors? Do I have to manually construct a design matrix using something like pd.get_dummies, and then follow my 2nd method using pm.math.dot?
That seems like a lot of manual work, so I suspect there is an easier way???