I’m trying to build a Hierarchical Linear Regression model for data which comes from several time series. I’m adapting this example notebook.
I’ve build a hierarchical linear regression model where each time series has slopes and intercepts which come from a shared slope or shared intercept distribution. I have included a seasonality term, consisted of an indicator matrix (n_rows by the 12 calendar months) which denotes the month of an observation with 1, otherwise values are 0. I multiply this indicator matrix by a vector or coefficients that I call my seasonal indicators. For one time series this works but I don’t how the proper way to specify the model with adding additionally time series. I would like the model to return a vector or seasonal components for each time series.
Below is some code which creates some data and builds the model.
import arviz as az
import graphviz
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc3 as pm
from statsmodels.tsa.deterministic import CalendarSeasonality
dates = pd.date_range(start='2021-01-15', periods=12, freq='M')
linear_response = np.linspace(2, 50, 12)
seasonality = np.sin(linear_response) * 20
rng = np.random.default_rng()
response_1 = linear_response + seasonality
response_2 = response_1 + rng.normal(scale=5, size=12)
response_3 = response_1 + rng.normal(scale=5, size=12) + 15
df = pd.DataFrame({'y1': response_1, 'y2': response_2, 'y3': response_3},
index=dates)
# Restructure data so that observations from 3 different columns are in one
# column. Another column 'variable' contains information on which time series
# each data point originated.
df = df.melt(ignore_index=False).sort_index()
scaled_time = (df.index - df.index.min()) / (df.index.max() - df.index.min())
df['scaled_time'] = scaled_time
df['scaled_value'] = df['value'] / df['value'].max()
indices, str_names = df['variable'].factorize()
coords = {
'var_names': str_names,
'var_indices': np.arange(len(indices))
}
# Create hierarchial linear model where parameters for a (intercept)
# and b (slope) are shared between time series.
with pm.Model(coords=coords) as hier_linear:
var_idx = pm.Data('indices', indices, dims='var_indices')
mu_a = pm.Normal('mu_a', mu=0.3, sigma=0.3)
sigma_a = pm.HalfNormal('sigma_a', 0.2)
mu_b = pm.Normal('mu_b', mu=0.4, sigma=0.3)
sigma_b = pm.HalfNormal('sigma_b', 0.15)
a = pm.Normal('a', mu=mu_a, sigma=sigma_a, dims='var_names')
b = pm.Normal('b', mu=mu_b, sigma=sigma_b, dims='var_names')
y_estimate = a[var_idx] + b[var_idx] * scaled_time.values
error = pm.HalfNormal('error', 0.1)
The hierarchical linear model is structured like this:
I would like it look like a hierarchical verision of this model:
However, I can’t broadcast my seasonal indicator variable to the correct shape- shown in the code block below:
indices, str_names = df['variable'].factorize()
coords = {
'calendar_features': np.arange(seasonal_indicator_df.shape[1]),
'var_names': str_names,
'var_indices': np.arange(len(indices))
}
with pm.Model(coords=coords) as hier_linear_w_seasonality:
var_idx = pm.Data('indices', indices, dims='var_indices')
mu_a = pm.Normal('mu_a', mu=30, sigma=0.3)
sigma_a = pm.HalfNormal('sigma_a', 0.10)
mu_b = pm.Normal('mu_b', mu=15, sigma=0.10)
sigma_b = pm.HalfNormal('sigma_b', 0.10)
a = pm.Normal('a', mu=mu_a, sigma=sigma_a, dims='var_names')
b = pm.Normal('b', mu=mu_b, sigma=sigma_b, dims='var_names')
b_calendar = pm.Normal('b_cal', mu=0, sigma=0.3,
dims=('calendar_features', 'var_names'))
# I would like b_calendar to be 12 * [number of timeseries] in dimension
seasonality = pm.Deterministic('seasonality',
pm.math.dot(b_calendar[var_idx],
seasonal_indicator_df.to_numpy().T))
y_estimate = (a[var_idx] + b[var_idx] * single_ts['scaled_time'].to_numpy())
+ seasonality
error = pm.HalfNormal('error', 0.15)
data_like = pm.Normal('y_expected', mu=y_estimate, sigma=error, observed=df['value'])
A ValueError: shapes (36,3) and (12,12) not aligned: 3 (dim 1) != 12 (dim 0)
is thrown by the line specifying seasonality seasonality = pm.Deterministic('seasonality', pm.math.dot(b_calendar[var_idx], seasonal_indicator_df.to_numpy().T))
.
Any pointers as to how to specific the shape of variable would be much appreciated!