Integrating Hierarchical Data at Multiple Levels in PyMC for Forecasting

geraudd · March 21, 2024, 9:49am

I’m working on forecasting multiple time-series to predict marketing campaigns future performance. My dataset includes campaigns (low level) and accounts (high level), where campaign volumes don’t directly sum up to account volumes. I aim to forecast campaign volumes using hierarchical modeling in PyMC, integrating both campaign and account level data.

The model employs non-centered partial pooling for campaigns and aims to incorporate account-level data to inform the campaign forecasts better. I’m particularly focused on automating the selection of good priors since this model will be part of an API, eliminating manual adjustment.

Here’s the simplified model structure:

with pm.Model() as model:
    time = np.arange(len(low_level_data))
    # High level
    a = pm.Normal("a", mu=0, sigma=1)
    b = pm.Normal("b", mu=0, sigma=1)
    high_mu = pm.Deterministic("high_mu", a + b * time)

    ## Likelihood 1
    high_sigma = pm.HalfNormal("high_sigma", sigma=1)
    high_likelihood = pm.Normal(
        "high_likelihood",
        mu=high_mu[:, None],
        sigma=high_sigma,
        observed=high_level_data,
    )
    # Low level
    ## Intercept
    intercept = pm.Normal("intercept", mu=0, sigma=1, shape=2)
    initial_slope = pm.Normal(
        "initial_slope", mu=b, sigma=1
    )  # !!! Prior is posterior from high level !!!
    offset_slope = pm.Normal("offset_slope", mu=0, sigma=1, shape=2)
    sigma_slope = pm.HalfNormal("sigma_slope", sigma=1)
    slope = pm.Deterministic("slope", initial_slope + offset_slope * sigma_slope)
    mu = pm.Deterministic("mu", intercept + slope * time[:, None])

    ## Likelihood 2
    sigma = pm.HalfNormal("sigma", sigma=1)
    likelihood = pm.Normal("likelihood", mu=mu, sigma=sigma, observed=low_level_data)

As you can see, I have two observed datasets and am attempting to use the higher-level parameter a as a prior for mu in the initial_slope.

I’m seeking advice on:

Feedback on the effectiveness of my approach for integrating hierarchical data at multiple levels. Are there potential pitfalls or improvements I should consider?
Should I sample all parameters simultaneously, or is it advisable to fit the account-level model first for better convergence?

Any examples, insights, or references to similar work would be immensely helpful. I’ve also shared a gist with some sample data for reference.

Thank you in advance for your help and suggestions!

Topic		Replies	Views
Prior Predictive Sampling in a Multilevel Linear Model v5 prior , hierarchical , pymc-marketing	2	266	May 7, 2024
Using Pymc3 to do forecasting and numerical integration Questions	11	2690	May 5, 2020
Feasibility of following Hierarchical model Questions	2	469	July 31, 2020
Hierarchical Modeling MMM with Geo-Data in PyMC v5 modeling , hierarchical	2	947	June 11, 2024
Hierarchical model Questions	0	448	August 13, 2021

Integrating Hierarchical Data at Multiple Levels in PyMC for Forecasting

Related topics