# Bayesian Linear Regression with Equality Constraints

Is it possible to perform Bayesian linear regression with equality constraints? Simply stated, I want to perform a multivariate linear regression with the constraint that my regression coefficients (which are possibly time varying) sum to 1. Thanks!

Yes you can – but often times it can be a struggle to get the prior you want to really work.

Setting up the model is super straightforward:

import numpy as np
import scipy as sp
import theano.tensor as tt
import pymc3 as pm
import seaborn as sbn
from matplotlib import pyplot as plt

k = 8
n = 70

coefs = np.random.uniform(-3, 3, size=(k,))
coefs = coefs - np.sum(coefs) + 1  # now sum to 1

X = np.random.normal(size=(n, k))
err = np.random.normal(size=n) * 5

y_ = np.dot(X, coefs)
y = y_ + err

with pm.Model() as mod:
beta_se = pm.HalfNormal('beta_se', 1.)
err_se = pm.HalfNormal('err_se', 1.)
coef_vec_first = pm.Normal('coef_vec_first', 0, beta_se, shape=(k-1,))
coef_vec = pm.Deterministic('coef_vec', tt.concatenate([coef_vec_first, tt.stack(1 - tt.sum(coef_vec_first))])
y_pred = pm.Deterministic('y_predicted', tt.dot(X, coef_vec))
y_obs = pm.Normal('y_lik', mu=y_pred, sd=err_se, observed=y)

trace = pm.sample(500, tune=1000, cores=2, chains=4)
prpred = pm.sample_prior_predicted(1500)


the trick, for a simple linear constraint dot(a, x)=c is to set the last component of x to be x_k = (c - a_{1:k-1}^Tx_{1:k-1})/a_k. This works fine:

>>> trace['coef_vec'].sum(axis=1)
array([1., 1., 1., ..., 1., 1., 1.])


If that’s all you wanted to do then you’re done.

However, you no longer have identical priors across the coefficients. In fact (for beta_se=1) the last coefficient is mean-1, variance k-1. Using mu=1/(k-1) will fix the mean, but you cant make the variance of every component to be equal using this method. This may be an unavoidable consequence of having a deterministic parameter (the constraint) as part of the model. I’m unfamiliar with any literature that manages to put identical marginal priors on a constrained parameter; but to start you’d want a distribution \mathcal{D}_f^{(k)} such that if x_1, \dots, x_k \sim_{iid} \mathcal{D}_f^{(k)} then f(x_1, \dots, x_k) \sim \mathcal{D}_f^{(k)}.

Edit: This is trivially possible for some functions f. For instance if f = \frac{1}{\sqrt{k}}\sum_{i=1}^k x_i then \mathcal{D}_f^{(k)} = N(0, \sigma^2) is a solution. But this corresponds to the constraint that the coefficients sum to 0.

2 Likes

Thanks! This is very helpful, I will give it a try