Hello! I’ve read this excellent reply by @OriolAbril on recommending passing “long data” rather than “wide data” (see a figure here for an example). I completely agree with this, and I’m trying to write my model code in this manner.
I’m doing time-series forecasting, and I’d like to add seasonal fourier features. I have observed values on two groups, so for a given date, I have an observed value for group A and another for group B. I am struggling with getting the shapes to work together, and would very much appreciate some pointers.
The data might look like this:
┌────────────┬───────┬───────────┐
│ date ┆ group ┆ actual │
│ --- ┆ --- ┆ --- │
│ date ┆ str ┆ f64 │
╞════════════╪═══════╪═══════════╡
│ 2023-01-01 ┆ A ┆ 2.091415 │
│ 2023-01-01 ┆ B ┆ -2.171598 │
│ 2023-01-02 ┆ A ┆ 1.892557 │
│ 2023-01-02 ┆ B ┆ -2.400566 │
└────────────┴───────┴───────────┘
I have shared the above dummy data and code in this gist.
It seems sensible for me that since we have 1 years worth of data, to instead of creating a 732 row long fourier feature, we create a 366 long one. This code does that:
def get_fourier_features(
dates: pd.Series, n_order: int, period: float
) -> tuple[pd.DataFrame, np.ndarray]:
day = dates.dt.day_of_year / period
labels = np.arange(2 * n_order)
features = pd.DataFrame(
{
f"{func.__name__}_order_{order}": func(2 * np.pi * day * order)
for order in range(1, n_order + 1)
for func in (np.sin, np.cos)
},
)
features.set_index(dates, inplace=True)
return features, labels
# monthly fourier series, we create a shape 365 x 12 dataframe here
fourier, fourier_labels = get_fourier_features(pd.Series(dates), 6, 30.5)
The following pymc model is my current attempt at making this work. The goal is making the model sample nicely with a b_fourier_monthly
of shape (2, 12)
.
coords = {
"index": index, # 732 rows
"date": date_labels, # 366 days
"group": group_labels, # 2 groups
"fourier": fourier_labels, # 12 fourier features
}
with pm.Model(coords=coords):
intercept = pm.Normal("intercept", mu=0, sigma=1, dims="group")
# HOW CAN I MAKE ONE B_FOURIER_MONTHLY PER GROUP (A, B)?
b_fourier_monthly = pm.Normal(
name="b_fourier_monthly",
mu=0,
sigma=1,
dims="fourier", # <--- Should I make this ("group", "fourier") instead?
)
seasonality_monthly = pm.Deterministic(
name="seasonality_monthly",
var=pymc.math.dot(b_fourier_monthly, fourier.T),
dims="date",
# this will currently have shape 366, or 2, 366 if I change the dims above
)
pm.Normal(
name="likelihood",
mu=(
intercept[group_idx]
+ seasonality_monthly # <-- this has shape (366,), while the intercept has shape (732,), so we get an error here
),
sigma=1,
observed=df_groups["actual"], # shape = (732, )
dims="index",
)
trace = pm.sample(1000)
Does anyone have any suggestions? As mentioned, the full code is here in this gist.