# Multivariate Linear Regression with correlated input variables

Hello,
I am trying to fit a multivariate linear regression (Y = alpha + beta1 * X1 + beta2 * X2) wherein my 2 input variables (X1, X2) are correlated.
I have uncertainties for all the variables (Y, X1, X2). I am using the following code to estimate the values for the intercept and the slopes:

``````with pm.Model() as model:
# Priors for unknown model parameters
alpha = pm.Normal("alpha", mu=0, sigma=10)
beta1 = pm.Normal("beta1", mu=0, sigma=10)
beta2 = pm.Normal("beta2", mu=0, sigma=10)

# Prior for covariance
chol, corr, stds = pm.LKJCholeskyCov(
"chol", n=2, eta=2.0,
sd_dist=pm.Exponential.dist(1.0), compute_corr=True
)

μ = pm.Normal("μ", 0., 10., shape=2, testval=test_values)

obs = pm.MvNormal("obs", μ, chol=chol, observed=observed_df)

# Likelihood
mu = alpha + beta1 * obs[:, 0] + beta2 * obs[:, 1]
Y_obs = pm.StudentT("Y_obs", mu=mu, sigma=y_error,
observed=y, nu=5)
``````

`y` and `y_error` are the dependent variable ‘Y’ and its uncertainty, respectively.
`observed_df` is a DataFrame with X1 and X2 as columns.

How can I incorporate the uncertainties on X1 and X2 in my model?

3 Likes

Hi, a couple of previous threads that might help:

Hi,
Thanks for the links. However, neither thread talks about correlated input variables. That’s the main part I would like to tackle.

How can one incorporate the uncertainties with this MvNormal construct?

``````obs = pm.MvNormal("obs", μ, chol=chol, observed=observed_df)
``````

The `observed` argument here only accepts the X1 and X2 observed data and not their uncertainties.

Is this a possible solution? The changes w.r.t. the above block of code are in the section immediately above the declaration of the likelihood.

``````with pm.Model() as model:
# Priors for unknown model parameters
alpha = pm.Normal("alpha", mu=0, sigma=10)
beta1 = pm.Normal("beta1", mu=0, sigma=10)
beta2 = pm.Normal("beta2", mu=0, sigma=10)

# Prior for covariance
chol, corr, stds = pm.LKJCholeskyCov(
"chol", n=2, eta=2.0,
sd_dist=pm.Exponential.dist(1.0), compute_corr=True
)

μ = pm.Normal("μ", 0., 10., shape=2, testval=test_values)

obs = pm.MvNormal("obs", μ, chol=chol, observed=observed_df)

# Update
x1_split, x2_split = tt.split(obs, [1, 1], n_splits=2, axis=1)
x1_split = tt.squeeze(x1_split)[:, 0]
x2_split = tt.squeeze(x2_split)[:, 0]

X1 = pm.Normal("X1", mu=x1_split, sigma=x1_error, shape=len(x1))
X2 = pm.Normal("X2", mu=x2_split, sigma=x2_error, shape=len(x2))

# Likelihood
mu = alpha + beta1 * X1 + beta2 * X2
Y_obs = pm.StudentT("Y_obs", mu=mu, sigma=y_error,
observed=y, nu=5)
``````

Aha, I see. Then more questions, sorry:

• Do you want to estimate the covariance of X1 and X2?
• Do you think the errors in X1 and X2 might also covary and you and want to estimate that too?

Cheers, Jon

Thanks for the follow up. I just wanted to estimate the covariance of X1 and X2. However, I will be interested in knowing how to do the latter as well.

Hi again,
Does anyone have any ideas on how to do this?

I think you’ll want your two betas to be MvNormals instead of Normals. There you can then specify a prior for the correlation between the two.