Adapting adstock transformation (media mix modelling) to hierarchical model

I’d like to do an adstock transformation (an econometric transformation used in advertising effectiveness modelling) within pm.Deterministic().

There is lots of resources on how to do it (A Bayesian Approach to Media Mix Modeling by Michael Johns & Zhenyu Wang) but I couldn’t find anything on how to incorporate into a hierarchical model. Well, there is this article (https://towardsdatascience.com/bayesian-hierarchical-marketing-mix-modeling-in-pymc-684f6024e57a) but it does a for loop within pm.Model() which I tried but it makes it very slow (8 hours 32,000 draws).

My model looks like this:

coords = {"group": group_labels}
  with pm.Model(coords=coords) as hierarchical:
  # Hyperpriors
  alpha_mu = pm.HalfNormal("alpha_mu", sigma=1)
  beta_mu = pm.HalfNormal("beta_mu", sigma=1)
  sigma_mu = pm.HalfNormal("sigma_mu", sigma=1)
  ...
  # Priors
  theta = pm.Beta("theta", a=alpha_mu, b=beta_mu, dims="groups")
  b_spend = pm.HalfNormal("b_spend", sigma=sigma_mu, dims="groups")
  ...
  # Data
  group_idx = pm.MutableData("group", df["group_idx"], dims="obs_id")
  spend = pm.MutableData("spend", df["spend"], dims="obs_id")
  ...
  # Model
  adstock = pm.Deterministic("spend_part", geometric_adstock(spend, theta) * b_spend[group_idx], dims="obs_id")
  ...
  # Likelihood
  ...

I took the geometric_adstock from here.

def geometric_adstock(x, alpha: float = 0.0, l_max: int = 12):
    """Geometric adstock transformation."""
    cycles = [
        at.concatenate(
            [at.zeros(i), x[: x.shape[0] - i]]
        )
        for i in range(l_max)
    ]
    x_cycle = at.stack(cycles)
    w = at.as_tensor_variable([at.power(alpha, i) for i in range(l_max)])
    return at.dot(w, x_cycle)

But of course it doesn’t work in my model.

I’m not an expert of vectorized operations nor linear algebra so this was my best attempt to make it work.

import aesara.tensor as at
import aesara

def at_adstock(x, theta, n_groups, max_len=12):
    x_ = x.reshape((n_groups, -1))
    X = at.stack(
        [
            at.stack(
                [
                    at.concatenate([np.zeros(i), x_[r, : x_.shape[1] - i]])
                    for i in range(max_len)
                ]
            )
            for r in range(n_groups)
        ]
    )
    W = at.transpose(at.stack([at.power(theta, i) for i in range(max_len)]))
    adstock_x = at.transpose(at.diagonal(at.dot(W, X), axis1=0, axis2=1))
    return adstock_x.reshape((1, -1))

It’s ugly but it does work I think based on this toy data

# one theta per group
theta = np.array([
    0.5,
    0.25,
    0.5,
    0.1
])
# 4 obs per group
x = np.array([
    0, 100, 0, 0, 0, 50, 0, 0, 0, 100, 0, 0, 100, 0, 0, 0
])
s = at.vector("s")
t = at.vector("t")
f_adstock = aesara.function([s, t], at_adstock(s, t, n_groups=4, max_len=3))
f_adstock(x, theta)
# array([[  0.   , 100.   ,  50.   ,  25.   ,   
#           0.   ,  50.   ,  12.5  , 3.125,   
#           0.   , 100.   ,  50.   ,  25.   , 
#         100.   ,  10.   , 1.   ,   0.   ]])

But I do wonder what’s the best or better ways of doing it.

UPDATE: I’ve noticed that because of my implementation of at_adstock, pm.sample() automatically switches to Metropolis for “theta” and the resulting trace is non-sensical. When I force the step to be NUTS I get an error:
NullTypeGradError: grad encountered a NaN. This variable is Null because the grad method for input 0 (Reshape{3}.0) of the ExtractDiag{offset=0, axis1=0, axis2=1, view=False} op is not implemented. BTW I’ve replaced at.diagonal with pymc.math.extract_diag hoping it would fix it. It didn’t. I know this is all due to my dirty implementation of at_adstock. I just can’t think of a better way of doing it.