HNLala
April 28, 2021, 1:57am
1
Hello,
I am trying to fit a multivariate linear regression (Y = alpha + beta1 * X1 + beta2 * X2) wherein my 2 input variables (X1, X2) are correlated.
I have uncertainties for all the variables (Y, X1, X2). I am using the following code to estimate the values for the intercept and the slopes:
with pm.Model() as model:
# Priors for unknown model parameters
alpha = pm.Normal("alpha", mu=0, sigma=10)
beta1 = pm.Normal("beta1", mu=0, sigma=10)
beta2 = pm.Normal("beta2", mu=0, sigma=10)
# Prior for covariance
chol, corr, stds = pm.LKJCholeskyCov(
"chol", n=2, eta=2.0,
sd_dist=pm.Exponential.dist(1.0), compute_corr=True
)
μ = pm.Normal("μ", 0., 10., shape=2, testval=test_values)
obs = pm.MvNormal("obs", μ, chol=chol, observed=observed_df)
# Likelihood
mu = alpha + beta1 * obs[:, 0] + beta2 * obs[:, 1]
Y_obs = pm.StudentT("Y_obs", mu=mu, sigma=y_error,
observed=y, nu=5)
y
and y_error
are the dependent variable ‘Y’ and its uncertainty, respectively.
observed_df
is a DataFrame with X1 and X2 as columns.
How can I incorporate the uncertainties on X1 and X2 in my model?
3 Likes
Hi, a couple of previous threads that might help:
Hi,
Thanks for the links. However, neither thread talks about correlated input variables. That’s the main part I would like to tackle.
How can one incorporate the uncertainties with this MvNormal construct?
obs = pm.MvNormal("obs", μ, chol=chol, observed=observed_df)
The observed
argument here only accepts the X1 and X2 observed data and not their uncertainties.
Is this a possible solution? The changes w.r.t. the above block of code are in the section immediately above the declaration of the likelihood.
with pm.Model() as model:
# Priors for unknown model parameters
alpha = pm.Normal("alpha", mu=0, sigma=10)
beta1 = pm.Normal("beta1", mu=0, sigma=10)
beta2 = pm.Normal("beta2", mu=0, sigma=10)
# Prior for covariance
chol, corr, stds = pm.LKJCholeskyCov(
"chol", n=2, eta=2.0,
sd_dist=pm.Exponential.dist(1.0), compute_corr=True
)
μ = pm.Normal("μ", 0., 10., shape=2, testval=test_values)
obs = pm.MvNormal("obs", μ, chol=chol, observed=observed_df)
# Update
x1_split, x2_split = tt.split(obs, [1, 1], n_splits=2, axis=1)
x1_split = tt.squeeze(x1_split)[:, 0]
x2_split = tt.squeeze(x2_split)[:, 0]
X1 = pm.Normal("X1", mu=x1_split, sigma=x1_error, shape=len(x1))
X2 = pm.Normal("X2", mu=x2_split, sigma=x2_error, shape=len(x2))
# Likelihood
mu = alpha + beta1 * X1 + beta2 * X2
Y_obs = pm.StudentT("Y_obs", mu=mu, sigma=y_error,
observed=y, nu=5)
Aha, I see. Then more questions, sorry:
Do you want to estimate the covariance of X1 and X2?
Do you think the errors in X1 and X2 might also covary and you and want to estimate that too?
Cheers, Jon
Thanks for the follow up. I just wanted to estimate the covariance of X1 and X2. However, I will be interested in knowing how to do the latter as well.
Hi again,
Does anyone have any ideas on how to do this?
I think you’ll want your two betas to be MvNormals instead of Normals. There you can then specify a prior for the correlation between the two.