Hierarchical Modeling MMM with Geo-Data in PyMC

Hello @morganstockham

Hope, I’m following correctly but you want to create a hierarchical model with a ("date","geo") granularity? If so. I’ll recommend you the following.

First, take advantage from pymc-marketing and implement any of the available adstocks :slight_smile:

from pymc_marketing.mmm.components.adstock import WeibullAdstock

adstock = WeibullAdstock(l_max=10, normalize=True)

You can choose between WeibullAdstock, DelayedAdstock or GeometricAdstock :slight_smile:

Assuming your dataframe has the following structure:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   date               30 non-null     datetime64[ns]
 1   geo                30 non-null     object        
 2   marketing_spend    30 non-null     float64       
 3   sales              30 non-null     float64       
 4   marketing_spend_2  30 non-null     float64       
dtypes: datetime64[ns](1), float64(3), object(1)
memory usage: 1.3+ KB

Where marketing_spend_N is a column with the spend/impression of certain channel. Then you can write your model as follow:

with pm.Model(coords=coordinates) as hierarchical_model:
    x_data = pm.Data(
        "x_data",
        value=X_data.values, #transform your dataframe into an array with the shape specify in the dims
        dims=("date", "channel", "geo")
    )

    y_data = pm.Data(
        "y_data",
        value=y.values, #transform your dataframe into an array with the shape specify in the dims
        dims=("date", "geo")
    )

    intercept = pm.Gamma('intercept', mu=500, sigma=300, dims="geo")
    
    #hyper priors (global mu & sigma)
    lam_prior_sigma = pm.HalfNormal('lam_prior_sigma', sigma=100,)
    lam_prior_mu = pm.Beta('lam_prior_mu', alpha=100, beta=200,)
    # prior distribution -> channels share the same hyper priors
    lam = pm.Gamma('lam', mu=lam_prior_mu, sigma=lam_prior_sigma, dims=("channel", "geo"))
    
    #hyper priors
    k_prior_sigma = pm.HalfNormal('k_prior_sigma', sigma=500,)
    k_prior_mu = pm.Beta('k_prior_mu', alpha=100, beta=200,)
    # prior dist
    k = pm.Gamma('k', mu=k_prior_mu, sigma=k_prior_sigma, dims=("channel", "geo"))
    
    # estimating contribution (only using the transformation to keep it simple)
    contribution = pm.Deterministic(
        name="contribution",
        var=adstock.function(x=x_data, lam=lam, k=k),
        dims=("date","channel", "geo")
    )

    yhat = pm.Deterministic(
        name="yhat", 
        var=(
            intercept + 
            contribution.sum(axis=1)
        ), 
        dims=("date", "geo")
    )

    sigma_likelihood = pm.HalfNormal("sigma_likelihood", sigma=200, dims="geo")
    nu = pm.Gamma(name="nu", alpha=400, beta=200, dims="geo")

    pm.StudentT(
        name="likelihood", 
        mu=yhat, 
        nu=nu, 
        sigma=sigma_likelihood, 
        dims=("date", "geo"), 
        observed=y_data
    )

After applying the following model, you can run pm.sample :v:t2:

If you need to apply transformation for some channels and not others then maybe worth to modify the coordinates to be something like:

coordinates = {
"date":...,
"channel_type_a":...,
"channel_type_b":...,
"geo":...
}

Then you can simply modify the shape of the X_data.

    x_data_a = pm.Data(
        "x_data_a",
        value=X_data_a.values, #transform your dataframe into an array with the shape specify in the dims
        dims=("date", "channel_type_a", "geo")
    )

    x_data_b = pm.Data(
        "x_data_b",
        value=X_data_b.values, #transform your dataframe into an array with the shape specify in the dims
        dims=("date", "channel_type_b", "geo")
    )

Using this way, you can decide to what data apply the transformation!

I recommend you to read the block from @twiecki about centered and non-centered hierarchies. The implementation expose here is a centered hierarchy.

As well, take a look to ZeroSumNormal which is a great alternative when you are leading with hierarchical model, where the number of parameters increase to a point of over-parametrization.

4 Likes