Multiple Linear Regression

Hello community!

I have dataset of 5 features with ~800 rows.
As per the quickstart guide, I’ll have to define priors for an alpha and 5 betas.

I’m building model for 2nd order polynomials, so further 5 beta distributions are added.
Here’s how code looks.

with pm.Model() as my_model:

a = pm.Normal('a', 0, sd=10)
b11 = pm.Normal('b11', 0, sd=10)
b12 = pm.Normal('b12', 0, sd=10)
b21 = pm.Normal('b21', 0, sd=10)
b22 = pm.Normal('b22', 0, sd=10)
b31 = pm.Normal('b31',0, sd=10)
b32 = pm.Normal('b32',0, sd=10)
b41 = pm.Normal('b41',0, sd=10)
b42 = pm.Normal('b42',0, sd=10)
b51 = pm.Normal('b51',0, sd=10)
b52 = pm.Normal('b52',0, sd=10)

eps = pm.HalfCauchy('eps', 5)
mu = a + b11*x1+ b12*x1**2 + b21*x2+ b22*x2**2 + b31*x3+ b32*x3**2 + b41*x4+ b42*x4**2 + b51*x5+ b52*x5**2 
y = pm.Normal('y', mu=mu, sd=eps, observed=y)
trace = pm.sample(draws=300,chains=2,cores=2,model=my_model)

The deterministic variable equation for mu looks quite large and I had to define whopping 10 distributions for all beta priors.

NUTS takes almost 1.5 hours to sample for a simple dataset of 800 rows( Maybe because of so many parameters).

Is there any other way to define model for large features set or am I missing something?

The key ingredient to improve speed is to try to vectorize your operations. If you’re unfamiliar with the term, it means to try to write the operations that you use in your model using matrix to vector products, or element wise operations that take advantage of the consecutive memory layout of the arrays. An analogous model to yours but using a different representation could look like this:

A = np.hstack((
    np.ones((len(X), 1)),
with pm.Model():
    beta = pm.Normal('beta', 0, 10, shape=11)
    mu =, beta)
    eps = pm.HalfCauchy('eps', 5)
    y = pm.Normal('y', mu=mu, sd=eps, observed=y)

As you can see, the big product and sum operation was replaced by a single dot product, which takes advantage of optimized linear algebra libraries like mkl or openblas, that are used by theano if possible.

You can always profile your models logp and gradient to explore the computational bottlenecks

Thanks a lot for suggestions @lucianopaz. I vectorized over thousand betas in a single line :sunglasses: for pooled hierarchical regression and sampling speed improved like anything!(50x)

1 Like