PyMC3 Model Fails to Converge - SamplingError: Initial evaluation of model at starting point failed!

I’ve created Marketing Media Mix Model, applying techniques outlined in this post.

Although I do have experience building Machine Learning Models and am currently getting my Masters in Data Science from UT Austin, this is the first PyMC3 Model I’ve built. I’ve modeled the data distributions I experienced working with the marketing team as a Data Scientist at a previous job, and applied the model described in the tutorial. Note that the data used with the model discussed with this tutorial is aggregated at the week level! Mine is reported by day. I’ve tried aggregating to week and am still getting the same “Sampling Error: Initial evaluation of model at starting point failed!”

Each row of my dataset corresponds to marketing spend per day, and each column corresponds to the Marketing Channel. I have the following columns:
- Channels with Adstock
‘TV’, ‘Referral’, ‘DirectMail’, ‘TradeShows’, ‘SocialMedia’,‘DisplayAds_Standard’, ‘ContentMarketing’, ‘GoogleAds’, ‘SEO’, ‘Email’, ‘AffiliateMarketing’
** - Channels with Saturation Only**
“DisplayAds_Programmatic”
** - Continuous Control Variables**
“TaxAct_Free”,
“TaxAct_FreePlusState”,
“TurboTax_Free”,
“TurboTax_FreePlusState”,
“TaxAct_TaxProAssist”,
“TurboTax_TaxProAssist”,
“TaxAct_TaxProAssistState”,
“TurboTax_TaxProAssistState”,
“TaxAct_Premium”,
“TurboTax_Premium”,
“TaxAct_PremiumState”,
“TurboTax_PremiumState”
Here is how the Adstock and Saturation Only Channels look:


I’ve outlined the response variable (y_obs) in pink.

Here is the code that applies the delay, adstock and geometric transformations:

# marketing spend channels that commonly see a delayed impact
delay_channels = [
    'TV', 'Referral', 'DirectMail', 'TradeShows', 'SocialMedia','DisplayAds_Standard', 'ContentMarketing',
       'GoogleAds', 'SEO', 'Email', 'AffiliateMarketing',
]
# marketing spend channel less susceptible to adstock effects
non_lin_channels = ["DisplayAds_Programmatic"]
# pricing of TaxAct (which corresponds to spend data) and pricing of competitor (TurboTax)
control_vars = [
    "TaxAct_Free",
    "TaxAct_FreePlusState",
    "TurboTax_Free",
    "TurboTax_FreePlusState",
    "TaxAct_TaxProAssist",
    "TurboTax_TaxProAssist",
    "TaxAct_TaxProAssistState",
    "TurboTax_TaxProAssistState",
    "TaxAct_Premium",
    "TurboTax_Premium",
    "TaxAct_PremiumState",
    "TurboTax_PremiumState",
]
############################## geometric adstock function
def geometric_adstock_tt(
    x, alpha=0, L=12, normalize=True
):  # 12 (days) is the delay or lag we expect to see?
    """
    The term "geometric" refers to the way weights are assigned to past values,
    which follows a geometric progression.
    In a geometric progression,
    each term is found by multiplying the previous term by a fixed, constant ratio (commonly denoted as "r").
    In the case of the geometric adstock function, the "alpha" parameter serves as this constant ratio.
    """
    # vector of weights assigned by decay rate alpha set to be 12 weeks
    w = tt.as_tensor_variable([tt.power(alpha, i) for i in range(L)])
    xx = tt.stack(
        [tt.concatenate([tt.zeros(i), x[: x.shape[0] - i]]) for i in range(L)]
    )

    if not normalize:
        y = tt.dot(w, xx)
    else:
        y = tt.dot(
            w / tt.sum(w), xx
        )  # dot product to get marketing channel over time frame of decay
    return y


### non-linear saturation function
def logistic_function(x_t, mu=0.1):
    # apply the logistic function to spend variable
    return (1 - np.exp(-mu * x_t)) / (1 * np.exp(-mu * x_t))


with pm.Model() as model:
    response_mean = []
    ################# ADSTOCK CHANNELS
    for channel_name in delay_channels:
        xx = df_in[channel_name].values
    
        print(f"Adding Delayed Channels: {channel_name}")
        channel_b = pm.HalfNormal(f"beta_{channel_name}", sd=5)  # DELAYED EFFECT
    
        alpha = pm.Beta(f"alpha_{channel_name}", alpha=3, beta=3)  # ADSTOCK EFFECT
        channel_mu = pm.Gamma(
            f"mu_{channel_name}", alpha=3, beta=1
        )  # SATURATION EFFECT
        response_mean.append(
            logistic_function(geometric_adstock_tt(xx, alpha), channel_mu) * channel_b
        )
    ################# SATURATION ONLY
    for channel_name in non_lin_channels:
        xx = df_in[channel_name].values
        print(f"Adding Non-Linear Logistic Channel: {channel_name}")
        channel_b = pm.HalfNormal(f"beta_{channel_name}", sd=5)  # DELAYED EFFECT

        channel_mu = pm.Gamma(
            f"mu_{channel_name}", alpha=3, beta=1
        )  # SATURATION EFFECT
        response_mean.append(logistic_function(xx, channel_mu) * channel_b)
    ################ CONTINUOUS CONTROL VARIABLES (monetary values)
    if control_vars:
        for channel_name in control_vars:
            x = df_in[channel_name].values
            print(f"Adding Control: {channel_name}")

            control_beta = pm.Normal(f"beta_{channel_name}", sd=0.25)  # prior
            channel_contrib = control_beta * x  # (prior * likelihood)
            response_mean.append(channel_contrib)
    #################### likelihood (y_hat)

    # I thought changing sigma may help the model converge, but it did not help:)
    sigma = pm.HalfNormal("sigma", sigma=1)
    # sigma = pm.Exponential('sigma', 3)
    # y_hat = pm.Normal("Y", sigma=1, mu=sum(response_mean), observed=y_obs)
    y_hat = pm.Weibull("Y", alpha=3, beta=sum(response_mean), observed=y_obs)

I’ve investigated these transformations and consulted an expert and don’t feel that this is the issue.

This leads me to conclude that the reason my model won’t converge is the likelihood distribution specified. I initially specified the Likelihood at the Normal distribution, however here is how my y_obs is actually distributed:

So I tried a Weibull distribution. I’m not sure if I’m specifying the alpha and beta correctly?

The other issue, and my main question, has to do with specifying the starting values. This code:
with model:
trace = pm.sample()
generates this error:

---------------------------------------------------------------------------
SamplingError                             Traceback (most recent call last)
Cell In[70], line 2
      1 with model:
----> 2     trace = pm.sample()

File ~/Library/Python/3.9/lib/python/site-packages/deprecat/classic.py:215, in deprecat.<locals>.wrapper_function(wrapped_, instance_, args_, kwargs_)
    213         else:
    214             warnings.warn(message, category=category, stacklevel=_routine_stacklevel)
--> 215 return wrapped_(*args_, **kwargs_)

File ~/Library/Python/3.9/lib/python/site-packages/pymc3/sampling.py:444, in sample(draws, step, init, n_init, initvals, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, callback, jitter_max_retries, start, return_inferencedata, idata_kwargs, mp_ctx, pickle_backend, **kwargs)
    442 start = deepcopy(start)
    443 if start is None:
--> 444     check_start_vals(model.test_point, model)
    445 else:
    446     if isinstance(start, dict):

File ~/Library/Python/3.9/lib/python/site-packages/pymc3/util.py:237, in check_start_vals(start, model)
    234 initial_eval = model.check_test_point(test_point=elem)
    236 if not np.all(np.isfinite(initial_eval)):
--> 237     raise SamplingError(
    238         "Initial evaluation of model at starting point failed!\n"
    239         "Starting values:\n{}\n\n"
    240         "Initial evaluation results:\n{}".format(elem, str(initial_eval))
    241     )

SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'beta_TV_log__': array(1.38364656), 'alpha_TV_logodds__': array(0.), 'mu_TV_log__': array(1.09861229), 'beta_Referral_log__': array(1.38364656), 'alpha_Referral_logodds__': array(0.), 'mu_Referral_log__': array(1.09861229), 'beta_DirectMail_log__': array(1.38364656), 'alpha_DirectMail_logodds__': array(0.), 'mu_DirectMail_log__': array(1.09861229), 'beta_TradeShows_log__': array(1.38364656), 'alpha_TradeShows_logodds__': array(0.), 'mu_TradeShows_log__': array(1.09861229), 'beta_SocialMedia_log__': array(1.38364656), 'alpha_SocialMedia_logodds__': array(0.), 'mu_SocialMedia_log__': array(1.09861229), 'beta_DisplayAds_Standard_log__': array(1.38364656), 'alpha_DisplayAds_Standard_logodds__': array(0.), 'mu_DisplayAds_Standard_log__': array(1.09861229), 'beta_ContentMarketing_log__': array(1.38364656), 'alpha_ContentMarketing_logodds__': array(0.), 'mu_ContentMarketing_log__': array(1.09861229), 'beta_GoogleAds_log__': array(1.38364656), 'alpha_GoogleAds_logodds__': array(0.), 'mu_GoogleAds_log__': array(1.09861229), 'beta_SEO_log__': array(1.38364656), 'alpha_SEO_logodds__': array(0.), 'mu_SEO_log__': array(1.09861229), 'beta_Email_log__': array(1.38364656), 'alpha_Email_logodds__': array(0.), 'mu_Email_log__': array(1.09861229), 'beta_AffiliateMarketing_log__': array(1.38364656), 'alpha_AffiliateMarketing_logodds__': array(0.), 'mu_AffiliateMarketing_log__': array(1.09861229), 'sigma_log__': array(-0.22579135)}

Initial evaluation results:
beta_TV_log__                         -0.77
alpha_TV_logodds__                    -0.76
mu_TV_log__                           -0.40
beta_Referral_log__                   -0.77
alpha_Referral_logodds__              -0.76
mu_Referral_log__                     -0.40
beta_DirectMail_log__                 -0.77
alpha_DirectMail_logodds__            -0.76
mu_DirectMail_log__                   -0.40
beta_TradeShows_log__                 -0.77
alpha_TradeShows_logodds__            -0.76
mu_TradeShows_log__                   -0.40
beta_SocialMedia_log__                -0.77
alpha_SocialMedia_logodds__           -0.76
mu_SocialMedia_log__                  -0.40
beta_DisplayAds_Standard_log__        -0.77
alpha_DisplayAds_Standard_logodds__   -0.76
mu_DisplayAds_Standard_log__          -0.40
beta_ContentMarketing_log__           -0.77
alpha_ContentMarketing_logodds__      -0.76
mu_ContentMarketing_log__             -0.40
beta_GoogleAds_log__                  -0.77
alpha_GoogleAds_logodds__             -0.76
mu_GoogleAds_log__                    -0.40
beta_SEO_log__                        -0.77
alpha_SEO_logodds__                   -0.76
mu_SEO_log__                          -0.40
beta_Email_log__                      -0.77
alpha_Email_logodds__                 -0.76
mu_Email_log__                        -0.40
beta_AffiliateMarketing_log__         -0.77
alpha_AffiliateMarketing_logodds__    -0.76
mu_AffiliateMarketing_log__           -0.40
sigma_log__                           -0.77
Y                                      -inf
Name: Log-probability of test_point, dtype: float64

I don’t understand how to read the initial evaluation results shown here, e.g. how do I interpret a mu_SocialMedia_log__ of -0.40. Is there an optimal value I should be shooting for here? If so, what would that be? I’ve tried setting all of these starting values to -20, 0 and 1. Should I be setting them all to the same value? Or should I be setting values according to whether the variable is the result of a mu, beta or alpha transformation? i.e. set the start values to beta_Email_log__ and beta_AffiliateMarketing_log__ to 1 and the start values for mu_Email_log__ and mu_Email_log__ to 2?

These are my questions:)
Does anyone have any insights? I’ve been trying to troubleshoot this for quite a while… I would be so grateful for any help.

You should try a more recent version of PyMC if you can. Anyway your error says you have a value or parameter for y that is impossible

ok thank you! Appreciate your input. I’ll take a look at my target variable and try a more recent version.