Rolling Regression in PyMC

Rolling regression basically just runs a bunch of independent regressions and collects the results. You can do that easily in PyMC:

from functools import reduce
import pymc as pm
import arviz as az

# Other imports + data preparation as from the statsmodels example

def make_window_data(series, window_size):
    n = series.shape[0]
    col_name = series.name
    windowed_list = [series.reset_index(drop=True).to_frame().iloc[i:window_size+i].rename(columns={col_name:i}) for i in range(n)]
    windowed_df = reduce(lambda left, right: left.join(right.reset_index(drop=True)), windowed_list)
    
    return windowed_df

windowed_endog = make_window_data(endog, 60)
windowed_exog = make_window_data(exog['Mkt-RF'], 60)

# Statsmodels drops the ends (because there's missing data) but in principle you don't have to do this
windowed_endog.dropna(axis=1, how='any', inplace=True)
windowed_exog.dropna(axis=1, how='any', inplace=True)

This gives you a window_size, n-window_size array of data that you’ll use to do a bunch of regressions. Then just run them all on the same MCMC chain:

coords = {
    'regression':windowed_endog.columns
}

with pm.Model(coords=coords) as mod:
    betas = pm.Normal('beta', dims=['regression'])
    alphas = pm.Normal('alpha', dims=['regression'])
    sigmas = pm.HalfCauchy('sigmas', beta=0.8, dims=['regression'])
    
    mus = alphas + betas * windowed_exog
    
    obs = pm.Normal('obs', mu=mus, sigma=sigmas, observed=windowed_endog.values)
    idata = pm.sample()

The “key move” is that I use broadcasting to compute mu: the betas will be broadcast over the columns, so each column of mu corresponds to an independent regression. PyMC will also interpret this as a collection of independent normal distributions for you when you hand it to pm.Normal to compute the likelihood.

Results for the first 25 regressions (with statsmodels estimates as red triangles):

4 Likes