Rolling regression basically just runs a bunch of independent regressions and collects the results. You can do that easily in PyMC:
from functools import reduce
import pymc as pm
import arviz as az
# Other imports + data preparation as from the statsmodels example
def make_window_data(series, window_size):
n = series.shape[0]
col_name = series.name
windowed_list = [series.reset_index(drop=True).to_frame().iloc[i:window_size+i].rename(columns={col_name:i}) for i in range(n)]
windowed_df = reduce(lambda left, right: left.join(right.reset_index(drop=True)), windowed_list)
return windowed_df
windowed_endog = make_window_data(endog, 60)
windowed_exog = make_window_data(exog['Mkt-RF'], 60)
# Statsmodels drops the ends (because there's missing data) but in principle you don't have to do this
windowed_endog.dropna(axis=1, how='any', inplace=True)
windowed_exog.dropna(axis=1, how='any', inplace=True)
This gives you a window_size, n-window_size
array of data that you’ll use to do a bunch of regressions. Then just run them all on the same MCMC chain:
coords = {
'regression':windowed_endog.columns
}
with pm.Model(coords=coords) as mod:
betas = pm.Normal('beta', dims=['regression'])
alphas = pm.Normal('alpha', dims=['regression'])
sigmas = pm.HalfCauchy('sigmas', beta=0.8, dims=['regression'])
mus = alphas + betas * windowed_exog
obs = pm.Normal('obs', mu=mus, sigma=sigmas, observed=windowed_endog.values)
idata = pm.sample()
The “key move” is that I use broadcasting to compute mu: the betas will be broadcast over the columns, so each column of mu corresponds to an independent regression. PyMC will also interpret this as a collection of independent normal distributions for you when you hand it to pm.Normal
to compute the likelihood.
Results for the first 25 regressions (with statsmodels estimates as red triangles):