# How to structure hierarchical linear regression with multiple time series

I’m trying to build a Hierarchical Linear Regression model for data which comes from several time series. I’m adapting this example notebook.

I’ve build a hierarchical linear regression model where each time series has slopes and intercepts which come from a shared slope or shared intercept distribution. I have included a seasonality term, consisted of an indicator matrix (n_rows by the 12 calendar months) which denotes the month of an observation with 1, otherwise values are 0. I multiply this indicator matrix by a vector or coefficients that I call my seasonal indicators. For one time series this works but I don’t how the proper way to specify the model with adding additionally time series. I would like the model to return a vector or seasonal components for each time series.

Below is some code which creates some data and builds the model.

``````import arviz as az
import graphviz
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc3 as pm
from statsmodels.tsa.deterministic import CalendarSeasonality

dates = pd.date_range(start='2021-01-15', periods=12, freq='M')
linear_response = np.linspace(2, 50, 12)
seasonality = np.sin(linear_response) * 20
rng = np.random.default_rng()

response_1 = linear_response + seasonality
response_2 = response_1 + rng.normal(scale=5, size=12)
response_3 = response_1 + rng.normal(scale=5, size=12) + 15
df = pd.DataFrame({'y1': response_1, 'y2': response_2, 'y3': response_3},
index=dates)

# Restructure data so that observations from 3 different columns are in one
# column. Another column 'variable' contains information on which time series
# each data point originated.
df = df.melt(ignore_index=False).sort_index()
scaled_time = (df.index - df.index.min()) / (df.index.max() - df.index.min())
df['scaled_time'] = scaled_time
df['scaled_value'] = df['value'] / df['value'].max()

indices, str_names = df['variable'].factorize()
coords = {
'var_names': str_names,
'var_indices': np.arange(len(indices))
}

# Create hierarchial linear model where parameters for a (intercept)
# and b (slope) are shared between time series.
with pm.Model(coords=coords) as hier_linear:
var_idx = pm.Data('indices', indices, dims='var_indices')
mu_a = pm.Normal('mu_a', mu=0.3, sigma=0.3)
sigma_a = pm.HalfNormal('sigma_a', 0.2)
mu_b = pm.Normal('mu_b', mu=0.4, sigma=0.3)
sigma_b = pm.HalfNormal('sigma_b', 0.15)

a = pm.Normal('a', mu=mu_a, sigma=sigma_a, dims='var_names')
b = pm.Normal('b', mu=mu_b, sigma=sigma_b, dims='var_names')

y_estimate = a[var_idx] + b[var_idx] * scaled_time.values
error = pm.HalfNormal('error', 0.1)
``````

The hierarchical linear model is structured like this:

I would like it look like a hierarchical verision of this model:

However, I can’t broadcast my seasonal indicator variable to the correct shape- shown in the code block below:

``````indices, str_names = df['variable'].factorize()
coords = {
'calendar_features': np.arange(seasonal_indicator_df.shape[1]),
'var_names': str_names,
'var_indices': np.arange(len(indices))
}

with pm.Model(coords=coords) as hier_linear_w_seasonality:
var_idx = pm.Data('indices', indices, dims='var_indices')
mu_a = pm.Normal('mu_a', mu=30, sigma=0.3)
sigma_a = pm.HalfNormal('sigma_a', 0.10)
mu_b = pm.Normal('mu_b', mu=15, sigma=0.10)
sigma_b = pm.HalfNormal('sigma_b', 0.10)

a = pm.Normal('a', mu=mu_a, sigma=sigma_a, dims='var_names')
b = pm.Normal('b', mu=mu_b, sigma=sigma_b, dims='var_names')

b_calendar = pm.Normal('b_cal', mu=0, sigma=0.3,
dims=('calendar_features', 'var_names'))
# I would like b_calendar to be 12 * [number of timeseries] in dimension
seasonality = pm.Deterministic('seasonality',
pm.math.dot(b_calendar[var_idx],
seasonal_indicator_df.to_numpy().T))

y_estimate = (a[var_idx] + b[var_idx] * single_ts['scaled_time'].to_numpy())
+ seasonality
error = pm.HalfNormal('error', 0.15)

data_like = pm.Normal('y_expected', mu=y_estimate, sigma=error, observed=df['value'])
``````

A `ValueError: shapes (36,3) and (12,12) not aligned: 3 (dim 1) != 12 (dim 0)` is thrown by the line specifying seasonality `seasonality = pm.Deterministic('seasonality', pm.math.dot(b_calendar[var_idx], seasonal_indicator_df.to_numpy().T))`.

Any pointers as to how to specific the shape of variable would be much appreciated!

1 Like

I ended up using the model below to solve my problems. Main things I did were:

• Restructure my dataset so that each time series in its own column
• Create a shared variable from my dataframe using `pm.Data` (follow this example in the docs)
• When wanting to multiple a vector by another vector with different first dimensions I have used `np.asarray(1, my_vector)` Based on this talk by Meenal Jhajharia, specifically the section regarding broadcasting

The final code and model graph:

``````coords = {
'calendar_features': np.arange(seasonal_indicator_df.shape[1]),
'time': new_data.index.to_numpy(),
'time_series': (new_data.columns)
}

with pm.Model(coords=coords) as hier_linear_w_seasonality:
data = pm.Data('time series', new_data, dims=('time', 'time_series'))
mu_a = pm.Normal('mu_a', mu=.30, sigma=0.2)
sigma_a = pm.HalfNormal('sigma_a', 0.2)
mu_b = pm.Normal('mu_b', mu=.15, sigma=0.2)
sigma_b = pm.HalfNormal('sigma_b', 0.2)

a = pm.Normal('a', mu=mu_a, sigma=sigma_a, dims='time_series')
b = pm.Normal('b', mu=mu_b, sigma=sigma_b, dims='time_series')

b_calendar = pm.Normal('b_cal', mu=0, sigma=0.3, dims=('calendar_features', 'time_series'))
seasonality = pm.Deterministic('seasonality', pm.math.dot(seasonal_indicator_df.to_numpy().T, b_calendar))

trend = pm.Deterministic('trend', a + (b * np.asarray(1, new_data.index)))
y_estimate = trend + seasonality
error = pm.HalfNormal('error', 0.15)

data_like = pm.Normal('y_expected', mu=y_estimate, sigma=error, observed=data)
``````

5 Likes