In the end it just comes down to preference I suppose. You can still iterate over the columns of a MutableData object using range. I’d split df into features and controls before modeling, make each its own data container then do the looping as follows (warning: untested code)
df_features = df[feature_vars]
n_obs, n_features = df_features.shape
coords = {'features':feature_vars,
'all_vars':feature_vars}
if control_vars:
df_controls = df[control_vars]
_, n_controls = df_controls.shape
coords.update({'controls':control_vars,
'all_vars':feature_vars + control_vars})
with pm.Model(coords=coords) as basic_model:
X = pm.MutableData('feature_data', df_features.values)
y = pm.MutableData('targets', df.target.values)
n_obs = X.shape[0]
betas = pm.HalfNormal('beta', sigma = 2, dims=['features'])
decays = pm.Beta('decay', alpha=3, beta=3, dims=['features'])
contributions = pt.zeros((n_obs, n_features + n_controls))
for i in range(n_features):
x = geometric_adstock_tt(X[:, i], decays[i]))*beta[i]
contributions = pt.set_subtensor(contributions[:, i], x)
if n_controls > 0:
Z = pm.MutableData('control_data', df_controls.values)
control_betas = pm.Normal('control_beta', sigma = 2, dims=['controls'])
pt.set_subtensor(contributions[:, n_features:], Z @ control_betas)
pm.Deterministic("contributions", contributions)
sigma = pm.HalfNormal('sigma', sigma=1)
y_hat= pm.Normal("y_hat", mu=contributions.sum(axis=-1), sigma=sigma, observed=y, shape=X.shape[0])
idata = pm.sample(idata_kwargs={'dims':{'contributions':[None, 'all_vars']}})
It’s basically the same as yours but I use the data containers. I also wanted to use coords to tag the different dimensions instead of f-strings – this is helpful when plotting with arviz, because it makes it easy to ask for a subset of coordinates. One wrinkle here was that I don’t know a way to only label a subset of the dimensions – for contributions, we want to label only the 2nd dimension (since the 1st dimension is a batch dim). As a result, I had to include the idata_kwargs argument in pm.sample to get the right coords on contributions.
Again, this is all just a matter of preference. Another way to do it would be to just include pm.MutableData in your loops, and use an f-string to name each mutable data object. The downside of this approach is that you will need to include each variable individually when you call pm.set_data, though admittedly you could write a function to ease that pain point.
I also didn’t look very carefully at the geometric_adstock_tt function, but I’m sure you could vectorize it with a bit of work. That would be my personal approach, but yet again, I emphasize that these are mostly style choices.
Also, have you checked out pymc-marketing? It’s a new project that is focusing on mixed-media models. It might be useful in your case.