Hierarchical Modeling MMM with Geo-Data in PyMC


I am a newbie to bayesian modeling and PyMC. I recently switched from cmstanr to PyMC and have been attempting to fit some expensive models. I want to develop an MMM with geometric adstock of my data which is available at the DMA-Day (for non-marketers this is like state-day data) level. I want to apply hierarchical modeling to my betas across DMAs associated with each channel, but fit a global theta for the adstock transformation. ChatGPT has suggested I write a loop within my PyMC model which looks something like this:

# Adstock Transformation:
def adstock_transform(df, theta, max_lag, columns_to_transform):
    # Copy the DataFrame to avoid modifying the original data
    transformed_df = df.copy()
    # Ensure theta is a PyTorch tensor
    theta = torch.tensor(theta, dtype=torch.float32)

    # Process only specified columns
    for col in columns_to_transform:
        if col in df.columns:
            # Convert column to PyTorch tensor
            spend = torch.tensor(df[col].values, dtype=torch.float32)

            max_spend = torch.max(spend)
            if max_spend > 0:  # Avoid division by zero
                spend = spend / max_spend
            # Initialize the tensor for adstocked values
            adstocked_column = torch.zeros_like(spend)
            # Apply adstock transformation with no spill-over across DMAs
            for lag in range(max_lag + 1):
                # Create a shifted version of the spend tensor
                shifted_spend = torch.roll(spend, shifts=lag, dims=0)
                # Apply the decay factor raised to the power of the lag
                decay = torch.pow(theta, lag)
                # Ensure no spill-over: zero out the values that cross DMA boundaries
                if lag > 0:
                    shifted_spend[:lag] = 0
                # Add to the adstocked spend for the column
                adstocked_column += decay * shifted_spend
            # Convert the adstocked tensor back to numpy and update the DataFrame
            transformed_df[col] = adstocked_column.numpy()
    return transformed_df
# Retrieve unique DMA codes
unique_dmas = df['dma_code'].unique()
dma_to_index = {dma: idx for idx, dma in enumerate(unique_dmas)}
# hierarchical model with looping through DMAs (geography)
with pm.Model() as dma_hierarchical_adstock_model:
    # Global model parameters
    intercept = pm.Normal('intercept', mu=0, sigma=1)
    sigma = pm.HalfNormal('sigma', sigma=10)
    theta = pm.Beta('theta', alpha=2, beta=2)  # Single global theta
    # Hierarchical betas for each DMA
    # Hyperpriors for the group means of betas
    mu_beta_con = pm.HalfNormal('mu_beta_con', sigma=1, shape = nX_direct)
    sigma_beta_con = pm.HalfNormal('sigma_beta_con', sigma=1, shape = nX_direct)
    mu_beta_uncon = pm.Normal('mu_beta_uncon', mu=0, sigma=1, shape = nX_unconstrained)
    sigma_beta_uncon = pm.HalfNormal('sigma_beta_uncon', sigma=1, shape = nX_unconstrained)
    # DMA-specific betas
    beta_con = pm.HalfNormal('beta_con', sigma=sigma_beta_con, shape=(len(unique_dmas), nX_direct))
    beta_uncon = pm.Normal('beta_uncon', mu=mu_beta_uncon, sigma=sigma_beta_uncon, shape=(len(unique_dmas), nX_unconsrained))
    # Process data for each DMA
    for dma in unique_dmas:
        dma_data = df[df['dma_code'] == dma]
        y = dma_data['Y'].values
        x_con = dma_data[rp_columns].values  # Assuming 'Spend' needs adstock transformation
        x_uncon = dma_data[['feelslike', 'precip', 'windspeed', 'light_hours', 'areasqmi', 'neighbor_shock']].values

        # Apply the adstock transform to the DMA's spend
        adstocked_spend = adstock_transform(x_con, theta, max_lag=max_lag, columns_to_transform=rp_columns)

        # DMA index for selecting specific betas
        dma_idx = dma_to_index[dma]

        # Expected sales: using dot product for matrix of predictors
        expected_sales = intercept + pm.math.dot(adstocked_spend, beta_con[dma_idx]) + pm.math.dot(x_uncon, beta_uncon[dma_idx])
        # Likelihood
        sales_obs = pm.Normal(f'sales_obs_{dma}', mu=expected_sales, sigma=sigma, observed=y)

Is this appropriate to do or is this not actually estimating a hierarchical model? ChatGPT loves to create an easy way out that doesn’t work and I thought I’d ask the experts!

edit: had incorrectly been estimating the effect of one beta instead of a vector.

Hello @morganstockham

Hope, I’m following correctly but you want to create a hierarchical model with a ("date","geo") granularity? If so. I’ll recommend you the following.

First, take advantage from pymc-marketing and implement any of the available adstocks :slight_smile:

from pymc_marketing.mmm.components.adstock import WeibullAdstock

adstock = WeibullAdstock(l_max=10, normalize=True)

You can choose between WeibullAdstock, DelayedAdstock or GeometricAdstock :slight_smile:

Assuming your dataframe has the following structure:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   date               30 non-null     datetime64[ns]
 1   geo                30 non-null     object        
 2   marketing_spend    30 non-null     float64       
 3   sales              30 non-null     float64       
 4   marketing_spend_2  30 non-null     float64       
dtypes: datetime64[ns](1), float64(3), object(1)
memory usage: 1.3+ KB

Where marketing_spend_N is a column with the spend/impression of certain channel. Then you can write your model as follow:

with pm.Model(coords=coordinates) as hierarchical_model:
    x_data = pm.Data(
        value=X_data.values, #transform your dataframe into an array with the shape specify in the dims
        dims=("date", "channel", "geo")

    y_data = pm.Data(
        value=y.values, #transform your dataframe into an array with the shape specify in the dims
        dims=("date", "geo")

    intercept = pm.Gamma('intercept', mu=500, sigma=300, dims="geo")
    #hyper priors (global mu & sigma)
    lam_prior_sigma = pm.HalfNormal('lam_prior_sigma', sigma=100,)
    lam_prior_mu = pm.Beta('lam_prior_mu', alpha=100, beta=200,)
    # prior distribution -> channels share the same hyper priors
    lam = pm.Gamma('lam', mu=lam_prior_mu, sigma=lam_prior_sigma, dims=("channel", "geo"))
    #hyper priors
    k_prior_sigma = pm.HalfNormal('k_prior_sigma', sigma=500,)
    k_prior_mu = pm.Beta('k_prior_mu', alpha=100, beta=200,)
    # prior dist
    k = pm.Gamma('k', mu=k_prior_mu, sigma=k_prior_sigma, dims=("channel", "geo"))
    # estimating contribution (only using the transformation to keep it simple)
    contribution = pm.Deterministic(
        var=adstock.function(x=x_data, lam=lam, k=k),
        dims=("date","channel", "geo")

    yhat = pm.Deterministic(
            intercept + 
        dims=("date", "geo")

    sigma_likelihood = pm.HalfNormal("sigma_likelihood", sigma=200, dims="geo")
    nu = pm.Gamma(name="nu", alpha=400, beta=200, dims="geo")

        dims=("date", "geo"), 

After applying the following model, you can run pm.sample :v:t2:

If you need to apply transformation for some channels and not others then maybe worth to modify the coordinates to be something like:

coordinates = {

Then you can simply modify the shape of the X_data.

    x_data_a = pm.Data(
        value=X_data_a.values, #transform your dataframe into an array with the shape specify in the dims
        dims=("date", "channel_type_a", "geo")

    x_data_b = pm.Data(
        value=X_data_b.values, #transform your dataframe into an array with the shape specify in the dims
        dims=("date", "channel_type_b", "geo")

Using this way, you can decide to what data apply the transformation!

I recommend you to read the block from @twiecki about centered and non-centered hierarchies. The implementation expose here is a centered hierarchy.

As well, take a look to ZeroSumNormal which is a great alternative when you are leading with hierarchical model, where the number of parameters increase to a point of over-parametrization.


Thank you so much!!