How to add an L1 Regularization on the likelihood when use pymc3 to sample a MCMC

voyager112 · April 16, 2019, 1:11pm

When I use pymc3 to do a sparse regression, I would like to add an L1 regularization to the objective function. How to achieve this in pymc3?

Thanks a lot!

junpenglao · April 16, 2019, 1:20pm

Depending on how you want to do this.

A Bayesian way: If you place a Laplace prior on the parameters (regression coefficients), you get essentially a L1 regularized regression.

A non-Bayesian way: add the L1 regularization using pm.Potential

X = ... # Your design matrix
lam = ... # constant controlling the sparsity 
with pm.Model() as m:
   betas = pm.Flat('betas', shape=X.shape[1])  < == DONT do this usually, it is just to replicate the L1 regularization completely
   pm.Potential('sum_of_square_error', pm.math.sum(pm.math.square(y - pm.math.dot(X, betas))))
   pm.Potential('l1', lam*pm.math.sum(pm.math.abs(betas)))

If you optimize the model m using pm.find_MAP, the solution is the sparse regression solution.

voyager112 · April 17, 2019, 1:21pm

Thank you so much!:

KIC1 · April 20, 2019, 7:56am

I am also very interested in l1 regularization, but I still don’t get it. So when you are saying “If you place a Laplace prior on the parameters (regression coefficients)”, then you essentially mean something like this?

# l1 regression
# where x is shape (10, m) and y is shape (10, 1). 
with pm.Model() as l1_model:
    λ = 1e-10 # is this our penalization control parameter?
    l1 = pm.Laplace("l1", 0, λ)
    
    β = pm.Normal("beta", l1, 1, shape=(x.shape[1], 1))
    ϵ = pm.Normal("epsilon", 0, 1)
    σ = pm.HalfCauchy("sigma", 1)
    
    y_hat = tt.dot(x, β) + ϵ
    ar = pm.Normal("ar", y_hat, sd=σ, observed=y)

I am not so sure if this works as expected since my betas look very similar to the “normal” model:

with pm.Model() as model:
    β = pm.Normal("beta", 0, 100, shape=(x.shape[1], 1))
    ϵ = pm.Normal("epsilon", 0, 1)
    σ = pm.HalfCauchy("sigma", 1)
    
    y_hat = tt.dot(x, β) + ϵ
    ar = pm.Normal("ar", y_hat, sd=σ, observed=y)

lucianopaz · April 20, 2019, 4:40pm

What @junpenglao meant was to put the Laplace prior over beta. Consider the following model:

with pm.Model() as l1_model:
    lam = 1e-10 # is this our penalization control parameter?
    
    β = pm.Laplace("beta", 0, 1/lam, shape=(x.shape[1],))
    epsilon = pm.Laplace("epsilon", 0, 1/lam)
    
    y_hat = tt.dot(x, beta) + epsilon
    ar = pm.Normal("ar", y_hat, sd=1, observed=y)

The model’s logp can be written as

\sum[-\frac{1}{2}(y-\hat{y})^{2} - \lambda (|\beta| + |\epsilon|)]

This is just like minimizing the mean squared difference and adding an L1 regularization over the regression parameters. That’s why @junpenglao said that if you used find_map, you would get the maximum likelihood parameters with the L1 regularization.

lucianopaz · April 20, 2019, 5:09pm

As a nice sidenote, if you had placed gaussian priors, you would have gotten something similar to L2 regularization. As with the L1 case, you just have to look at the resulting logp that would be used by find_map.

Like these two, there are many more analogies between Bayesian priors and different forms of regularizations. @brandonwillard is much more knowledgeable than me on the subject.

Topic		Replies	Views
Bayesian sparse logistic regression in PyMC3 Questions	2	678	February 7, 2020
L1 regularization with positivity constraint in hierarchical model Questions	2	546	June 15, 2021
L1 and L2 pegularization for an Autoregressive model Questions	4	1155	January 1, 2020
How to use Empirical Bayes（EB） in pymc3 Questions	8	1413	December 26, 2018
The pymc3 way: Linear regression and inferring given posterior/trace Questions	4	1757	May 9, 2018

How to add an L1 Regularization on the likelihood when use pymc3 to sample a MCMC

Related topics