When I use pymc3 to do a sparse regression, I would like to add an L1 regularization to the objective function. How to achieve this in pymc3?
Thanks a lot!
When I use pymc3 to do a sparse regression, I would like to add an L1 regularization to the objective function. How to achieve this in pymc3?
Thanks a lot!
Depending on how you want to do this.
A Bayesian way: If you place a Laplace prior on the parameters (regression coefficients), you get essentially a L1 regularized regression.
A non-Bayesian way: add the L1 regularization using pm.Potential
X = ... # Your design matrix
lam = ... # constant controlling the sparsity
with pm.Model() as m:
betas = pm.Flat('betas', shape=X.shape[1]) < == DONT do this usually, it is just to replicate the L1 regularization completely
pm.Potential('sum_of_square_error', pm.math.sum(pm.math.square(y - pm.math.dot(X, betas))))
pm.Potential('l1', lam*pm.math.sum(pm.math.abs(betas)))
If you optimize the model m
using pm.find_MAP, the solution is the sparse regression solution.
Thank you so much!:
I am also very interested in l1 regularization, but I still donāt get it. So when you are saying āIf you place a Laplace prior on the parameters (regression coefficients)ā, then you essentially mean something like this?
# l1 regression
# where x is shape (10, m) and y is shape (10, 1).
with pm.Model() as l1_model:
Ī» = 1e-10 # is this our penalization control parameter?
l1 = pm.Laplace("l1", 0, Ī»)
Ī² = pm.Normal("beta", l1, 1, shape=(x.shape[1], 1))
Ļµ = pm.Normal("epsilon", 0, 1)
Ļ = pm.HalfCauchy("sigma", 1)
y_hat = tt.dot(x, Ī²) + Ļµ
ar = pm.Normal("ar", y_hat, sd=Ļ, observed=y)
I am not so sure if this works as expected since my betas look very similar to the ānormalā model:
with pm.Model() as model:
Ī² = pm.Normal("beta", 0, 100, shape=(x.shape[1], 1))
Ļµ = pm.Normal("epsilon", 0, 1)
Ļ = pm.HalfCauchy("sigma", 1)
y_hat = tt.dot(x, Ī²) + Ļµ
ar = pm.Normal("ar", y_hat, sd=Ļ, observed=y)
What @junpenglao meant was to put the Laplace
prior over beta
. Consider the following model:
with pm.Model() as l1_model:
lam = 1e-10 # is this our penalization control parameter?
Ī² = pm.Laplace("beta", 0, 1/lam, shape=(x.shape[1],))
epsilon = pm.Laplace("epsilon", 0, 1/lam)
y_hat = tt.dot(x, beta) + epsilon
ar = pm.Normal("ar", y_hat, sd=1, observed=y)
The modelās logp
can be written as
This is just like minimizing the mean squared difference and adding an L1 regularization over the regression parameters. Thatās why @junpenglao said that if you used find_map
, you would get the maximum likelihood parameters with the L1 regularization.
As a nice sidenote, if you had placed gaussian priors, you would have gotten something similar to L2 regularization. As with the L1 case, you just have to look at the resulting logp
that would be used by find_map
.
Like these two, there are many more analogies between Bayesian priors and different forms of regularizations. @brandonwillard is much more knowledgeable than me on the subject.