Hi there! I am trying to model the price elasticity of different products, while controlling for seasonality.
Here is how one can generate the data:
import numpy as np
import pandas as pd
np.random.seed(42)
n_months = 12
n_floorplans = 3
n_obs = n_months * n_floorplans
data = {
"month_id": np.repeat(range(1, n_months+1), n_floorplans),
"product": np.tile(["A", "B", "C"], n_months),
"demand_A": np.random.poisson(100, n_obs),
"demand_B": np.random.poisson(80, n_obs),
"demand_C": np.random.poisson(60, n_obs),
"price_A": np.random.normal(1000, 50, n_obs),
"price_B": np.random.normal(1200, 70, n_obs),
"price_C": np.random.normal(1500, 100, n_obs),
}
df = pd.DataFrame(data)
The model computes alpha
and beta
to fit demand = exp(alpha + beta * (log(price) - log(avg_price)))
for each product.
And the seasonality is modeled using a Gaussian random walk. So, it’s an array of 12 values. Here is the full model:
import pymc as pm
import pytensor.tensor as pt
month_idx, month_labels = df["month_id"].factorize(sort=True)
product_idx, product_labels = df["product"].factorize(sort=True)
coords = {
"products": product_labels,
"months": month_labels,
}
with pm.Model(coords=coords) as funnel:
# define a monthly trend, as a random walk
month_effect = pm.GaussianRandomWalk('month_effect', init_dist=pm.Exponential.dist(0.25), dims="months")
# estimates the elasticity coefficients
alpha = pm.Cauchy('alpha', 1, 2, dims="products")
beta = pm.Cauchy('beta', -3, 2, dims="products")
price_columns = [f"price_{p_name}" for p_name in product_labels]
average_prices = df[price_columns].mean(axis=0)
logμ0 = alpha + (beta * (np.log(df[price_columns]) - np.log(average_prices)))
μ0 = np.exp(logμ0) + month_effect[month_idx]
demand_columns = [f"demand_{p_name}" for p_name in product_labels]
demand_obs = pm.Poisson('demand_obs', μ0, observed=df[demand_columns])
idata = pm.sample()
When I attempt to sample without + month_effect[month_idx]
, it works well.
Now, after adding that seasonality component back, it throws:
ValueError: Input dimension mismatch. One other input has shape[1] = 3, but input[3].shape[1] = 36.
The way I understand it, it should work because:
-
μ0 = np.exp(logμ0)
has as many items than the number of data points in the dataset, since it’s fed into a distribution withobserved
set to the dataset. -
month_effect[month_idx]
should have the same shape, sincemonth_idx
has as many items thandf.shape[0]
What am I missing here?
(pymc==5.1.2)