How to structure hierarchical linear regression with multiple time series

I’m trying to build a Hierarchical Linear Regression model for data which comes from several time series. I’m adapting this example notebook.

I’ve build a hierarchical linear regression model where each time series has slopes and intercepts which come from a shared slope or shared intercept distribution. I have included a seasonality term, consisted of an indicator matrix (n_rows by the 12 calendar months) which denotes the month of an observation with 1, otherwise values are 0. I multiply this indicator matrix by a vector or coefficients that I call my seasonal indicators. For one time series this works but I don’t how the proper way to specify the model with adding additionally time series. I would like the model to return a vector or seasonal components for each time series.

Below is some code which creates some data and builds the model.

import arviz as az
import graphviz
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc3 as pm
from statsmodels.tsa.deterministic import CalendarSeasonality


dates = pd.date_range(start='2021-01-15', periods=12, freq='M')
linear_response = np.linspace(2, 50, 12)
seasonality = np.sin(linear_response) * 20
rng = np.random.default_rng()

response_1 = linear_response + seasonality
response_2 = response_1 + rng.normal(scale=5, size=12)
response_3 = response_1 + rng.normal(scale=5, size=12) + 15
df = pd.DataFrame({'y1': response_1, 'y2': response_2, 'y3': response_3},
             index=dates)

# Restructure data so that observations from 3 different columns are in one
# column. Another column 'variable' contains information on which time series
# each data point originated.
df = df.melt(ignore_index=False).sort_index()
scaled_time = (df.index - df.index.min()) / (df.index.max() - df.index.min())
df['scaled_time'] = scaled_time
df['scaled_value'] = df['value'] / df['value'].max()

indices, str_names = df['variable'].factorize()
coords = {
    'var_names': str_names,
    'var_indices': np.arange(len(indices))
}

# Create hierarchial linear model where parameters for a (intercept) 
# and b (slope) are shared between time series. 
with pm.Model(coords=coords) as hier_linear:
    var_idx = pm.Data('indices', indices, dims='var_indices')
    mu_a = pm.Normal('mu_a', mu=0.3, sigma=0.3)
    sigma_a = pm.HalfNormal('sigma_a', 0.2)
    mu_b = pm.Normal('mu_b', mu=0.4, sigma=0.3)
    sigma_b = pm.HalfNormal('sigma_b', 0.15)
   
    a = pm.Normal('a', mu=mu_a, sigma=sigma_a, dims='var_names')
    b = pm.Normal('b', mu=mu_b, sigma=sigma_b, dims='var_names')
    
    y_estimate = a[var_idx] + b[var_idx] * scaled_time.values
    error = pm.HalfNormal('error', 0.1)

The hierarchical linear model is structured like this:

I would like it look like a hierarchical verision of this model:

However, I can’t broadcast my seasonal indicator variable to the correct shape- shown in the code block below:

indices, str_names = df['variable'].factorize()
coords = {
    'calendar_features': np.arange(seasonal_indicator_df.shape[1]),
    'var_names': str_names,
    'var_indices': np.arange(len(indices))
}

with pm.Model(coords=coords) as hier_linear_w_seasonality:
    var_idx = pm.Data('indices', indices, dims='var_indices')
    mu_a = pm.Normal('mu_a', mu=30, sigma=0.3)
    sigma_a = pm.HalfNormal('sigma_a', 0.10)
    mu_b = pm.Normal('mu_b', mu=15, sigma=0.10)
    sigma_b = pm.HalfNormal('sigma_b', 0.10)
   
    a = pm.Normal('a', mu=mu_a, sigma=sigma_a, dims='var_names')
    b = pm.Normal('b', mu=mu_b, sigma=sigma_b, dims='var_names')
    
    b_calendar = pm.Normal('b_cal', mu=0, sigma=0.3,
                           dims=('calendar_features', 'var_names'))
    # I would like b_calendar to be 12 * [number of timeseries] in dimension
    seasonality = pm.Deterministic('seasonality',
                  pm.math.dot(b_calendar[var_idx],
                  seasonal_indicator_df.to_numpy().T))
    
    y_estimate = (a[var_idx] + b[var_idx] * single_ts['scaled_time'].to_numpy())
                  + seasonality
    error = pm.HalfNormal('error', 0.15)
    
    data_like = pm.Normal('y_expected', mu=y_estimate, sigma=error, observed=df['value'])

A ValueError: shapes (36,3) and (12,12) not aligned: 3 (dim 1) != 12 (dim 0) is thrown by the line specifying seasonality seasonality = pm.Deterministic('seasonality', pm.math.dot(b_calendar[var_idx], seasonal_indicator_df.to_numpy().T)).

Any pointers as to how to specific the shape of variable would be much appreciated!

1 Like

I ended up using the model below to solve my problems. Main things I did were:

  • Restructure my dataset so that each time series in its own column
  • Create a shared variable from my dataframe using pm.Data (follow this example in the docs)
  • When wanting to multiple a vector by another vector with different first dimensions I have used np.asarray(1, my_vector) Based on this talk by Meenal Jhajharia, specifically the section regarding broadcasting

The final code and model graph:

coords = {
    'calendar_features': np.arange(seasonal_indicator_df.shape[1]),
    'time': new_data.index.to_numpy(),
    'time_series': (new_data.columns)
}

with pm.Model(coords=coords) as hier_linear_w_seasonality:
    data = pm.Data('time series', new_data, dims=('time', 'time_series'))
    mu_a = pm.Normal('mu_a', mu=.30, sigma=0.2)
    sigma_a = pm.HalfNormal('sigma_a', 0.2)
    mu_b = pm.Normal('mu_b', mu=.15, sigma=0.2)
    sigma_b = pm.HalfNormal('sigma_b', 0.2)
   
    a = pm.Normal('a', mu=mu_a, sigma=sigma_a, dims='time_series')
    b = pm.Normal('b', mu=mu_b, sigma=sigma_b, dims='time_series')
    
    b_calendar = pm.Normal('b_cal', mu=0, sigma=0.3, dims=('calendar_features', 'time_series'))
    seasonality = pm.Deterministic('seasonality', pm.math.dot(seasonal_indicator_df.to_numpy().T, b_calendar))
    
    trend = pm.Deterministic('trend', a + (b * np.asarray(1, new_data.index)))
    y_estimate = trend + seasonality
    error = pm.HalfNormal('error', 0.15)
    
    data_like = pm.Normal('y_expected', mu=y_estimate, sigma=error, observed=data)

5 Likes