Hello,
I’m working on a PyMC model to detect seasonality patterns in several time series, grouped by month. I aim to have each group share a common distribution for each month and a single sigma across all months and groups. Additionally, I want the model to accommodate new groups in future predictions using dimensions and coordinates.
In a previous iteration of this model (described in the ‘False’ section of the code), I utilized a Laplace distribution defined over GROUP_ID and MONTHS dimensions. This setup supported new coordinates automatically. However, it resulted in too many distributions to fit, leading to suboptimal results. To streamline this, I attempted to eliminate the Laplace distribution. Since only distributions seem to handle unseen coordinates, I used a DiracDelta distribution to reshape my ‘mu’ parameter into a matrix with GROUP_ID and MONTH dimensions. While the reshape was successful, DiracDelta lacks a sampler, as illustrated in the ‘True’ section of the provided code.
My question is: Is there an alternative to DiracDelta that can reshape a ‘mu’ vector of size MONTH into a tiled matrix with dimensions GROUP_ID x MONTH? This alternative should also support new coordinates in future predictions.
Here is the code example illustrating my current approach and the previous one:
import pymc as pm
import pandas as pd
import numpy as np
# create some coordinates
coords = {
"MONTHS": list(range(1, 13)),
"GROUP_ID": ["a", "b", "c"],
"DATE": pd.date_range("2021-01-01", periods=24, freq="MS"),
}
# Add the coordinates to the model and make them mutable
with pm.Model() as model:
for key, value in coords.items():
model.add_coord(name=key, values=value, mutable=True)
# Make MutableData
months = coords["DATE"].month.values
months_eye = np.eye(12)[months - 1, :]
with model:
months = pm.MutableData("months", months_eye, dims=("DATE", "MONTHS"))
y = np.random.randn(24, 3)
if True: # Define the model - version that I want to use
with model:
mu = pm.Normal("mu", mu=0, sigma=1, dims="MONTHS")
# Using DiracDelta to reshape the mu vector
reshape = pm.DiracDelta("reshape", c=mu, dims=("GROUP_ID", "MONTHS"))
sigma = pm.HalfNormal("sigma", sigma=1)
indexes = pm.Deterministic("indexes", reshape * sigma, dims="MONTHS")
monthly_indexes = pm.Deterministic(
"monthly_indexes", months_eye @ indexes.T, dims=("DATE", "GROUP_ID")
)
# Likelihood
pm.Normal(
"y", mu=monthly_indexes, sigma=1, observed=y, dims=("DATE", "GROUP_ID")
)
# This fails because there is obviously no sampler for DiracDelta
idata = pm.sample(draws=100, tune=100, chains=1, cores=1)
new_groups = coords["GROUP_ID"] + ["d", "e"]
model.set_dim("GROUP_ID", len(new_groups), new_groups)
idata = pm.sample_posterior_predictive(
idata, predictions=True, extend_inferencedata=True
)
# This model should automatically handles new GROUP_IDs
assert list(idata.posterior.GROUP_ID.values) == ["a", "b", "c"]
assert list(idata.predictions.GROUP_ID.values) == ["a", "b", "c", "d", "e"]
else: # previous version of the model that worked and could hamdle new GROUP_IDs
with model:
mu_offset = pm.Normal("mu_offset", mu=0, sigma=1, dims="MONTHS")
sigma_offset = pm.HalfNormal("sigma_offset", sigma=1)
# This is not what I want because I get a ton of distributions MONTHS x GROUPS
indexes = pm.Laplace(
"indexes", mu=mu_offset, b=sigma_offset, dims=("GROUP_ID", "MONTHS")
).T
monthly_indexes = pm.Deterministic(
"monthly_indexes", months @ indexes, dims=("DATE", "GROUP_ID")
)
# Likelihood
pm.Normal(
"y", mu=monthly_indexes, sigma=1, observed=y, dims=("DATE", "GROUP_ID")
)
idata = pm.sample(draws=100, tune=100, chains=1, cores=1)
new_groups = coords["GROUP_ID"] + ["d", "e"]
model.set_dim("GROUP_ID", len(new_groups), new_groups)
idata = pm.sample_posterior_predictive(
idata, predictions=True, extend_inferencedata=True
)
# This model automatically handles new GROUP_IDs
assert list(idata.posterior.GROUP_ID.values) == ["a", "b", "c"]
assert list(idata.predictions.GROUP_ID.values) == ["a", "b", "c", "d", "e"]