I’ve been playing around with the PyMCStateSpace class in pymc-experimental. Today I’m attempting to recreate the following figure from Commandeur & Koopman, page 24:
I have used the following code to fit the model with PyMC:
import pandas as pd
import pymc as pm
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pytensor
import pytensor.tensor as pt
from pymc_experimental.statespace import structural as st
import pymc_experimental.statespace
# Load and clean the data
dat = pd.read_csv("data/ukdriversksi.txt", skiprows=1, header=None, names=['y'])
date_range = pd.date_range(start='1969-01-01', periods=192, freq='MS')
dat['log_y'] = np.log(dat['y'])
dat['date'] = date_range
# Set up the matrixes
ll = st.LevelTrendComponent(order=2)
ss_mod = ll.build()
# Declare the model and samplle
with pm.Model(coords=ss_mod.coords) as model:
P0_diag = pm.Gamma("P0_diag", alpha=2, beta=1, dims=['state'])
P0 = pm.Deterministic("P0", pt.diag(P0_diag), dims=['state', 'state_aux'])
initial_trend = pm.Normal("initial_trend", sigma=[10, 10], dims=['trend_state'])
sigma_trend = pm.Gamma("sigma_trend", alpha=2, beta=1, dims=['trend_shock'])
ss_mod.build_statespace_graph(dat['log_y'], mode="JAX")
idata = pm.sample(nuts_sampler="numpyro")
# Generate posterior predictive samples
with model:
pm.sample_posterior_predictive(idata, extend_inferencedata=True)
# Convert to a pandas df for my sanity
posterior_predictive_df = idata.posterior_predictive.to_dataframe().reset_index()
# Create a 'date' column to match the dataset
posterior_predictive_df['date'] = posterior_predictive_df['time'].apply(lambda x: date_range[x])
# Compute the grouped mean value of 'obs'
posterior_predictive_df['post_pred_mean'] = posterior_predictive_df.groupby('date')['obs'].transform('mean')
# Keep only unique values
posterior_predictive_df = posterior_predictive_df.groupby('date')['post_pred_mean'].agg('mean').reset_index()
# Janky manual shift
# posterior_predictive_df['post_pred_mean'] = posterior_predictive_df['post_pred_mean'].shift(-1)
# Plot
plt.figure(figsize=(14, 7))
plt.plot(posterior_predictive_df['date'], posterior_predictive_df['post_pred_mean'], label='Posterior predictive mean', color='red')
plt.plot(dat['date'], dat['log_y'], label='Observed data', color='blue')
The posterior predictions seem to lead the observed timepoints by 1. It seems like this could be due to:
- Misunderstanding on my part of how indexing works with these models;
- Error in how I’ve specified the model;
- Error in my data munging.
Thanks in advance for your help: probably this is just a small thing, but I’m new to PyMC and have gotten stumped on how to debug. Hoping to get to the bottom of it before I try to fit more complex models from that book.