Multivariate Linear Regression with correlated input variables

Hello,
I am trying to fit a multivariate linear regression (Y = alpha + beta1 * X1 + beta2 * X2) wherein my 2 input variables (X1, X2) are correlated.
I have uncertainties for all the variables (Y, X1, X2). I am using the following code to estimate the values for the intercept and the slopes:

with pm.Model() as model:
    # Priors for unknown model parameters 
    alpha = pm.Normal("alpha", mu=0, sigma=10)
    beta1 = pm.Normal("beta1", mu=0, sigma=10)
    beta2 = pm.Normal("beta2", mu=0, sigma=10)

    # Prior for covariance
    chol, corr, stds = pm.LKJCholeskyCov(
        "chol", n=2, eta=2.0, 
        sd_dist=pm.Exponential.dist(1.0), compute_corr=True
    )
    
    μ = pm.Normal("μ", 0., 10., shape=2, testval=test_values) 

    obs = pm.MvNormal("obs", μ, chol=chol, observed=observed_df)
    
     # Likelihood
     mu = alpha + beta1 * obs[:, 0] + beta2 * obs[:, 1]
     Y_obs = pm.StudentT("Y_obs", mu=mu, sigma=y_error,
                        observed=y, nu=5)

y and y_error are the dependent variable ‘Y’ and its uncertainty, respectively.
observed_df is a DataFrame with X1 and X2 as columns.

How can I incorporate the uncertainties on X1 and X2 in my model?

3 Likes

Hi, a couple of previous threads that might help:

Hi,
Thanks for the links. However, neither thread talks about correlated input variables. That’s the main part I would like to tackle.

How can one incorporate the uncertainties with this MvNormal construct?

obs = pm.MvNormal("obs", μ, chol=chol, observed=observed_df)

The observed argument here only accepts the X1 and X2 observed data and not their uncertainties.

Is this a possible solution? The changes w.r.t. the above block of code are in the section immediately above the declaration of the likelihood.

with pm.Model() as model:
    # Priors for unknown model parameters 
    alpha = pm.Normal("alpha", mu=0, sigma=10)
    beta1 = pm.Normal("beta1", mu=0, sigma=10)
    beta2 = pm.Normal("beta2", mu=0, sigma=10)

    # Prior for covariance
    chol, corr, stds = pm.LKJCholeskyCov(
        "chol", n=2, eta=2.0, 
        sd_dist=pm.Exponential.dist(1.0), compute_corr=True
    )
    
    μ = pm.Normal("μ", 0., 10., shape=2, testval=test_values) 

    obs = pm.MvNormal("obs", μ, chol=chol, observed=observed_df)

    # Update
    x1_split, x2_split = tt.split(obs, [1, 1], n_splits=2, axis=1)
    x1_split = tt.squeeze(x1_split)[:, 0]
    x2_split = tt.squeeze(x2_split)[:, 0]

    X1 = pm.Normal("X1", mu=x1_split, sigma=x1_error, shape=len(x1))
    X2 = pm.Normal("X2", mu=x2_split, sigma=x2_error, shape=len(x2))    


     # Likelihood
     mu = alpha + beta1 * X1 + beta2 * X2
     Y_obs = pm.StudentT("Y_obs", mu=mu, sigma=y_error,
                        observed=y, nu=5)

Aha, I see. Then more questions, sorry:

  • Do you want to estimate the covariance of X1 and X2?
  • Do you think the errors in X1 and X2 might also covary and you and want to estimate that too?

Cheers, Jon

Thanks for the follow up. I just wanted to estimate the covariance of X1 and X2. However, I will be interested in knowing how to do the latter as well.

Hi again,
Does anyone have any ideas on how to do this?

I think you’ll want your two betas to be MvNormals instead of Normals. There you can then specify a prior for the correlation between the two.