# Integrating Hierarchical Data at Multiple Levels in PyMC for Forecasting

I’m working on forecasting multiple time-series to predict marketing campaigns future performance. My dataset includes campaigns (low level) and accounts (high level), where campaign volumes don’t directly sum up to account volumes. I aim to forecast campaign volumes using hierarchical modeling in PyMC, integrating both campaign and account level data.

The model employs non-centered partial pooling for campaigns and aims to incorporate account-level data to inform the campaign forecasts better. I’m particularly focused on automating the selection of good priors since this model will be part of an API, eliminating manual adjustment.

Here’s the simplified model structure:

``````with pm.Model() as model:
time = np.arange(len(low_level_data))
# High level
a = pm.Normal("a", mu=0, sigma=1)
b = pm.Normal("b", mu=0, sigma=1)
high_mu = pm.Deterministic("high_mu", a + b * time)

## Likelihood 1
high_sigma = pm.HalfNormal("high_sigma", sigma=1)
high_likelihood = pm.Normal(
"high_likelihood",
mu=high_mu[:, None],
sigma=high_sigma,
observed=high_level_data,
)
# Low level
## Intercept
intercept = pm.Normal("intercept", mu=0, sigma=1, shape=2)
initial_slope = pm.Normal(
"initial_slope", mu=b, sigma=1
)  # !!! Prior is posterior from high level !!!
offset_slope = pm.Normal("offset_slope", mu=0, sigma=1, shape=2)
sigma_slope = pm.HalfNormal("sigma_slope", sigma=1)
slope = pm.Deterministic("slope", initial_slope + offset_slope * sigma_slope)
mu = pm.Deterministic("mu", intercept + slope * time[:, None])

## Likelihood 2
sigma = pm.HalfNormal("sigma", sigma=1)
likelihood = pm.Normal("likelihood", mu=mu, sigma=sigma, observed=low_level_data)
``````

As you can see, I have two observed datasets and am attempting to use the higher-level parameter `a` as a prior for `mu` in the `initial_slope`.

1. Feedback on the effectiveness of my approach for integrating hierarchical data at multiple levels. Are there potential pitfalls or improvements I should consider?
2. Should I sample all parameters simultaneously, or is it advisable to fit the account-level model first for better convergence?

Any examples, insights, or references to similar work would be immensely helpful. I’ve also shared a gist with some sample data for reference.