Linear regression: Bias (intercept) term "baked into" covariate data?

gbernstein · March 24, 2019, 5:34pm

I’m working with a standard Bayesian linear regression model, **y** ~ N(X theta, sigma^2), where we have k covariates and n individuals so X = n-by-k, theta = k-by-1, and y = n-by-1 matrices. If we want a bias term in this model, it’s easy to have the last element of theta as the bias weight and then have the last covariate, i.e. the last column of X, be all ones.

My issue is all of the posts on pymc3 linear regression I’ve seen, both on this discourse and abroad, model the bias term explicitly as a separate variable, e.g. like this.

Given my problem setup (that goes beyond standard Bayesian linear regression), it’s much easier for me to specify the model with the bias term baked into the covariate data. Is that possible to do within a pymc3 model?

X = pm.distributions.continuous.Normal('X', mu=mu, sd=sd, shape=(n, k))
X = np.concatenate((X, np.ones(n, 1))  # is there a way to do this in pymc3?

gbernstein · March 24, 2019, 5:40pm

Eventually figured out the necessary theano manipulations with help from this post. Figured I’d post anyways in case it helps others. (Mods, feel free to edit/delete as necessary.)

The ones column vector below needs to be created in a thean-ic way. Everything else is standard pymc3. Note that this specific code won’t work for k=1 because of the way pymc3 internally handles shape. This code can be modified as needed to handle that corner case, I’m just not posting it here for clarity. Also note the below code doesn’t incorporate any of the priors needed for actual Bayesian linear regression.

import numpy as np
import pymc3 as pm
import theano.tensor as T

k = 2 # number of covariates, not including bias term
n = 100 # number of individuals
with pm.Model():

    x_temp = pm.distributions.continuous.Normal('X', mu=np.array([0] * k), sd=1, shape=(n, k))
    ones = T.shape_padright(pm.math.ones_like(x_temp[:, 0]))
    
    x = pm.math.concatenate((x_temp, ones), axis=1)

    thetas = pm.distributions.multivariate.MvNormal('t', mu=np.array([0] * (k + 1)),
                                                         cov=np.diag(np.ones(k+1)),
                                                         shape=(1, k + 1))

    y = pm.distributions.continuous.Normal('y', mu=pm.math.flatten(pm.math.dot(thetas, x.T)), sd=1, shape=n)

    trace = pm.sampling.sample()

Topic		Replies	Views
Prediction by Bayesian linear regression Questions	7	1801	April 12, 2021
Regression model using only categorical variables Questions	9	5932	October 26, 2021
Building the linear equation of a logistic regression model Questions	13	1531	December 5, 2018
Bayesian Linear Regression with Equality Constraints Questions	3	1786	November 11, 2023
Bambi vs pymc model specification Questions linear_model	5	1320	February 26, 2022

Linear regression: Bias (intercept) term "baked into" covariate data?

Related topics