Hierarchical Modeling MMM with Geo-Data in PyMC

morganstockham · June 10, 2024, 6:57pm

Hello!

I am a newbie to bayesian modeling and PyMC. I recently switched from cmstanr to PyMC and have been attempting to fit some expensive models. I want to develop an MMM with geometric adstock of my data which is available at the DMA-Day (for non-marketers this is like state-day data) level. I want to apply hierarchical modeling to my betas across DMAs associated with each channel, but fit a global theta for the adstock transformation. ChatGPT has suggested I write a loop within my PyMC model which looks something like this:

# Adstock Transformation:
def adstock_transform(df, theta, max_lag, columns_to_transform):
    # Copy the DataFrame to avoid modifying the original data
    transformed_df = df.copy()
    
    # Ensure theta is a PyTorch tensor
    theta = torch.tensor(theta, dtype=torch.float32)

    # Process only specified columns
    for col in columns_to_transform:
        if col in df.columns:
            # Convert column to PyTorch tensor
            spend = torch.tensor(df[col].values, dtype=torch.float32)

            max_spend = torch.max(spend)
            if max_spend > 0:  # Avoid division by zero
                spend = spend / max_spend
            
            # Initialize the tensor for adstocked values
            adstocked_column = torch.zeros_like(spend)
            
            # Apply adstock transformation with no spill-over across DMAs
            for lag in range(max_lag + 1):
                # Create a shifted version of the spend tensor
                shifted_spend = torch.roll(spend, shifts=lag, dims=0)
                
                # Apply the decay factor raised to the power of the lag
                decay = torch.pow(theta, lag)
                
                # Ensure no spill-over: zero out the values that cross DMA boundaries
                if lag > 0:
                    shifted_spend[:lag] = 0
                
                # Add to the adstocked spend for the column
                adstocked_column += decay * shifted_spend
            
            # Convert the adstocked tensor back to numpy and update the DataFrame
            transformed_df[col] = adstocked_column.numpy()
    
    return transformed_df

# Retrieve unique DMA codes
unique_dmas = df['dma_code'].unique()
dma_to_index = {dma: idx for idx, dma in enumerate(unique_dmas)}
    
# hierarchical model with looping through DMAs (geography)
with pm.Model() as dma_hierarchical_adstock_model:
    # Global model parameters
    intercept = pm.Normal('intercept', mu=0, sigma=1)
    sigma = pm.HalfNormal('sigma', sigma=10)
    theta = pm.Beta('theta', alpha=2, beta=2)  # Single global theta
    
    # Hierarchical betas for each DMA
    # Hyperpriors for the group means of betas
    mu_beta_con = pm.HalfNormal('mu_beta_con', sigma=1, shape = nX_direct)
    sigma_beta_con = pm.HalfNormal('sigma_beta_con', sigma=1, shape = nX_direct)
    mu_beta_uncon = pm.Normal('mu_beta_uncon', mu=0, sigma=1, shape = nX_unconstrained)
    sigma_beta_uncon = pm.HalfNormal('sigma_beta_uncon', sigma=1, shape = nX_unconstrained)
    
    # DMA-specific betas
    beta_con = pm.HalfNormal('beta_con', sigma=sigma_beta_con, shape=(len(unique_dmas), nX_direct))
    beta_uncon = pm.Normal('beta_uncon', mu=mu_beta_uncon, sigma=sigma_beta_uncon, shape=(len(unique_dmas), nX_unconsrained))
    
    # Process data for each DMA
    for dma in unique_dmas:
        dma_data = df[df['dma_code'] == dma]
        y = dma_data['Y'].values
        x_con = dma_data[rp_columns].values  # Assuming 'Spend' needs adstock transformation
        x_uncon = dma_data[['feelslike', 'precip', 'windspeed', 'light_hours', 'areasqmi', 'neighbor_shock']].values

        # Apply the adstock transform to the DMA's spend
        adstocked_spend = adstock_transform(x_con, theta, max_lag=max_lag, columns_to_transform=rp_columns)

        # DMA index for selecting specific betas
        dma_idx = dma_to_index[dma]

        # Expected sales: using dot product for matrix of predictors
        expected_sales = intercept + pm.math.dot(adstocked_spend, beta_con[dma_idx]) + pm.math.dot(x_uncon, beta_uncon[dma_idx])
        
        # Likelihood
        sales_obs = pm.Normal(f'sales_obs_{dma}', mu=expected_sales, sigma=sigma, observed=y)

Is this appropriate to do or is this not actually estimating a hierarchical model? ChatGPT loves to create an easy way out that doesn’t work and I thought I’d ask the experts!

edit: had incorrectly been estimating the effect of one beta instead of a vector.

cetagostini · June 11, 2024, 4:38pm

Hello @morganstockham

Hope, I’m following correctly but you want to create a hierarchical model with a ("date","geo") granularity? If so. I’ll recommend you the following.

First, take advantage from pymc-marketing and implement any of the available adstocks

from pymc_marketing.mmm.components.adstock import WeibullAdstock

adstock = WeibullAdstock(l_max=10, normalize=True)

You can choose between WeibullAdstock, DelayedAdstock or GeometricAdstock

Assuming your dataframe has the following structure:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   date               30 non-null     datetime64[ns]
 1   geo                30 non-null     object        
 2   marketing_spend    30 non-null     float64       
 3   sales              30 non-null     float64       
 4   marketing_spend_2  30 non-null     float64       
dtypes: datetime64[ns](1), float64(3), object(1)
memory usage: 1.3+ KB

Where marketing_spend_N is a column with the spend/impression of certain channel. Then you can write your model as follow:

with pm.Model(coords=coordinates) as hierarchical_model:
    x_data = pm.Data(
        "x_data",
        value=X_data.values, #transform your dataframe into an array with the shape specify in the dims
        dims=("date", "channel", "geo")
    )

    y_data = pm.Data(
        "y_data",
        value=y.values, #transform your dataframe into an array with the shape specify in the dims
        dims=("date", "geo")
    )

    intercept = pm.Gamma('intercept', mu=500, sigma=300, dims="geo")
    
    #hyper priors (global mu & sigma)
    lam_prior_sigma = pm.HalfNormal('lam_prior_sigma', sigma=100,)
    lam_prior_mu = pm.Beta('lam_prior_mu', alpha=100, beta=200,)
    # prior distribution -> channels share the same hyper priors
    lam = pm.Gamma('lam', mu=lam_prior_mu, sigma=lam_prior_sigma, dims=("channel", "geo"))
    
    #hyper priors
    k_prior_sigma = pm.HalfNormal('k_prior_sigma', sigma=500,)
    k_prior_mu = pm.Beta('k_prior_mu', alpha=100, beta=200,)
    # prior dist
    k = pm.Gamma('k', mu=k_prior_mu, sigma=k_prior_sigma, dims=("channel", "geo"))
    
    # estimating contribution (only using the transformation to keep it simple)
    contribution = pm.Deterministic(
        name="contribution",
        var=adstock.function(x=x_data, lam=lam, k=k),
        dims=("date","channel", "geo")
    )

    yhat = pm.Deterministic(
        name="yhat", 
        var=(
            intercept + 
            contribution.sum(axis=1)
        ), 
        dims=("date", "geo")
    )

    sigma_likelihood = pm.HalfNormal("sigma_likelihood", sigma=200, dims="geo")
    nu = pm.Gamma(name="nu", alpha=400, beta=200, dims="geo")

    pm.StudentT(
        name="likelihood", 
        mu=yhat, 
        nu=nu, 
        sigma=sigma_likelihood, 
        dims=("date", "geo"), 
        observed=y_data
    )

After applying the following model, you can run pm.sample

If you need to apply transformation for some channels and not others then maybe worth to modify the coordinates to be something like:

coordinates = {
"date":...,
"channel_type_a":...,
"channel_type_b":...,
"geo":...
}

Then you can simply modify the shape of the X_data.

    x_data_a = pm.Data(
        "x_data_a",
        value=X_data_a.values, #transform your dataframe into an array with the shape specify in the dims
        dims=("date", "channel_type_a", "geo")
    )

    x_data_b = pm.Data(
        "x_data_b",
        value=X_data_b.values, #transform your dataframe into an array with the shape specify in the dims
        dims=("date", "channel_type_b", "geo")
    )

Using this way, you can decide to what data apply the transformation!

I recommend you to read the block from @twiecki about centered and non-centered hierarchies. The implementation expose here is a centered hierarchy.

As well, take a look to ZeroSumNormal which is a great alternative when you are leading with hierarchical model, where the number of parameters increase to a point of over-parametrization.

morganstockham · June 11, 2024, 5:21pm

Thank you so much!!

Topic		Replies	Views
Pymc-marketing hierarchical (geo-level) model version agnostic pymc-marketing	1	482	August 12, 2024
Hierarchical MMM model for different geo using PyMC v5 pymc-marketing	3	184	February 19, 2025
Adapting adstock transformation (media mix modelling) to hierarchical model version agnostic theano , aesara	0	874	November 18, 2022
Prediction on Hierarchical model v5 modeling	0	165	December 19, 2023
Hierarchical model with uneven categories v5 hierarchical	4	392	May 15, 2024

Hierarchical Modeling MMM with Geo-Data in PyMC

Related topics