Optimizing Seasonality Detection in PyMC: Seeking Alternatives to DiracDelta for Efficient Reshaping and Future unseen coordinates Handling

geraudd · November 14, 2023, 3:56pm

Hello,

I’m working on a PyMC model to detect seasonality patterns in several time series, grouped by month. I aim to have each group share a common distribution for each month and a single sigma across all months and groups. Additionally, I want the model to accommodate new groups in future predictions using dimensions and coordinates.

In a previous iteration of this model (described in the ‘False’ section of the code), I utilized a Laplace distribution defined over GROUP_ID and MONTHS dimensions. This setup supported new coordinates automatically. However, it resulted in too many distributions to fit, leading to suboptimal results. To streamline this, I attempted to eliminate the Laplace distribution. Since only distributions seem to handle unseen coordinates, I used a DiracDelta distribution to reshape my ‘mu’ parameter into a matrix with GROUP_ID and MONTH dimensions. While the reshape was successful, DiracDelta lacks a sampler, as illustrated in the ‘True’ section of the provided code.

My question is: Is there an alternative to DiracDelta that can reshape a ‘mu’ vector of size MONTH into a tiled matrix with dimensions GROUP_ID x MONTH? This alternative should also support new coordinates in future predictions.

Here is the code example illustrating my current approach and the previous one:

import pymc as pm
import pandas as pd
import numpy as np

# create some coordinates
coords = {
    "MONTHS": list(range(1, 13)),
    "GROUP_ID": ["a", "b", "c"],
    "DATE": pd.date_range("2021-01-01", periods=24, freq="MS"),
}

# Add the coordinates to the model and make them mutable
with pm.Model() as model:
    for key, value in coords.items():
        model.add_coord(name=key, values=value, mutable=True)


# Make MutableData
months = coords["DATE"].month.values
months_eye = np.eye(12)[months - 1, :]

with model:
    months = pm.MutableData("months", months_eye, dims=("DATE", "MONTHS"))


y = np.random.randn(24, 3)
if True:  # Define the model - version that I want to use
    with model:
        mu = pm.Normal("mu", mu=0, sigma=1, dims="MONTHS")
        # Using DiracDelta to reshape the mu vector
        reshape = pm.DiracDelta("reshape", c=mu, dims=("GROUP_ID", "MONTHS"))
        sigma = pm.HalfNormal("sigma", sigma=1)
        indexes = pm.Deterministic("indexes", reshape * sigma, dims="MONTHS")
        monthly_indexes = pm.Deterministic(
            "monthly_indexes", months_eye @ indexes.T, dims=("DATE", "GROUP_ID")
        )
        # Likelihood
        pm.Normal(
            "y", mu=monthly_indexes, sigma=1, observed=y, dims=("DATE", "GROUP_ID")
        )

        # This fails because there is obviously no sampler for DiracDelta
        idata = pm.sample(draws=100, tune=100, chains=1, cores=1)
        new_groups = coords["GROUP_ID"] + ["d", "e"]
        model.set_dim("GROUP_ID", len(new_groups), new_groups)
        idata = pm.sample_posterior_predictive(
            idata, predictions=True, extend_inferencedata=True
        )

    # This model should automatically handles new GROUP_IDs
    assert list(idata.posterior.GROUP_ID.values) == ["a", "b", "c"]
    assert list(idata.predictions.GROUP_ID.values) == ["a", "b", "c", "d", "e"]

else:  # previous version of the model that worked and could hamdle new GROUP_IDs
    with model:
        mu_offset = pm.Normal("mu_offset", mu=0, sigma=1, dims="MONTHS")
        sigma_offset = pm.HalfNormal("sigma_offset", sigma=1)
        # This is not what I want because I get a ton of distributions MONTHS x GROUPS
        indexes = pm.Laplace(
            "indexes", mu=mu_offset, b=sigma_offset, dims=("GROUP_ID", "MONTHS")
        ).T
        monthly_indexes = pm.Deterministic(
            "monthly_indexes", months @ indexes, dims=("DATE", "GROUP_ID")
        )
        # Likelihood
        pm.Normal(
            "y", mu=monthly_indexes, sigma=1, observed=y, dims=("DATE", "GROUP_ID")
        )
        idata = pm.sample(draws=100, tune=100, chains=1, cores=1)
        new_groups = coords["GROUP_ID"] + ["d", "e"]
        model.set_dim("GROUP_ID", len(new_groups), new_groups)
        idata = pm.sample_posterior_predictive(
            idata, predictions=True, extend_inferencedata=True
        )

    # This model automatically handles new GROUP_IDs
    assert list(idata.posterior.GROUP_ID.values) == ["a", "b", "c"]
    assert list(idata.predictions.GROUP_ID.values) == ["a", "b", "c", "d", "e"]

Topic		Replies	Views
Multi-dimensional Dims Do not Seem to Work v5 modeling	3	865	July 11, 2022
Out of sample prediction v5 modeling	20	1846	April 7, 2023
Modelling long data with multiple groups	3	346	February 24, 2024
PyMC v5.10.3 prediction stuff v5 modeling	11	433	January 15, 2024
Reshaping within PYMC4 v5	2	640	September 1, 2022

Optimizing Seasonality Detection in PyMC: Seeking Alternatives to DiracDelta for Efficient Reshaping and Future unseen coordinates Handling

Related topics