Getting the same prediction when using the PyMC3 data container to generate Bayesian regression prediction using new data

I built the Bayesian regression using PyMC3 package. I’m trying to generate prediction using new data. I used the data container pm.Data() to train the model with the training data, then passed the new data to pm.set_data() before calling pm.sample_posterior_predictive(). The prediction was what I would expect from the training data, not the new data.

Here’s my model:

df_train = df.drop(['Unnamed: 0', 'DATE_AT'], axis=1)

with Model() as model:
    response_mean = []
    x_ = pm.Data('features', df_train) # a data container, can be changed
    t = np.transpose(x_.get_value())
    
    # intercept
    y = Normal('y', mu=0, sigma=6000)
    response_mean.append(y)
    
    # channels that can have DECAY and SATURATION effects
    for channel_name in delay_channels:
        i = df_train.columns.get_loc(channel_name)
        xx = t[i].astype(float)
        
        print(f'Adding Delayed Channels: {channel_name}')
        c = coef.loc[coef['features']==channel_name, 'coef'].values[0]
        s = abs(c*0.015)
        if c <= 0:
            channel_b = HalfNormal(f'beta_{channel_name}', sd=s)
        else:
            channel_b = Normal(f'beta_{channel_name}', mu=c, sigma=s)
        
        alpha = Beta(f'alpha_{channel_name}', alpha=3, beta=3)
        channel_mu = Gamma(f'mu_{channel_name}', alpha=3, beta=1)
        response_mean.append(logistic_function(
            geometric_adstock_tt(xx, alpha), channel_mu) * channel_b)
    
    # channels that have SATURATION effects only
    for channel_name in non_lin_channels:
        i = df_train.columns.get_loc(channel_name)
        xx = t[i].astype(float)
        
        print(f'Adding Non-Linear Logistic Channel: {channel_name}')
        c = coef.loc[coef['features']==channel_name, 'coef'].values[0]
        s = abs(c*0.015)
        if c <= 0:
            channel_b = HalfNormal(f'beta_{channel_name}', sd=s)
        else:
            channel_b = Normal(f'beta_{channel_name}', mu=c, sigma=s)
        
        # logistic reach curve
        channel_mu = Gamma(f'mu_{channel_name}', alpha=3, beta=1)
        response_mean.append(logistic_function(xx, channel_mu) * channel_b)
        
    # continuous external features
    for channel_name in control_vars:
        i = df_train.columns.get_loc(channel_name)
        xx = t[i].astype(float)

        print(f'Adding control: {channel_name}')
        c = coef.loc[coef['features']==channel_name, 'coef'].values[0]
        s = abs(c*0.015)
        if c <= 0:
            control_beta = HalfNormal(f'beta_{channel_name}', sd=s)
        else:
            control_beta = Normal(f'beta_{channel_name}', mu=c, sigma=s)
            
        channel_contrib = control_beta * xx
        response_mean.append(channel_contrib)
        
    # categorical control variables
    for var_name in index_vars:
        i = df_train.columns.get_loc(var_name)
        shape = len(np.unique(t[i]))
        x = t[i].astype('int')
        
        print(f'Adding Index Variable: {var_name}')
        
        ind_beta = Normal(f'beta_{var_name}', sd=6000, shape=shape)
        channel_contrib = ind_beta[x]
        response_mean.append(channel_contrib)
        
    # noise
    sigma = Exponential('sigma', 10)

    
    # define likelihood
    likelihood = Normal(outcome, mu=sum(response_mean), sd=sigma, observed=df[outcome].values)
    
    trace = pm.sample(tune=3000, cores=4, init='advi')

Here’s the beta’s from the model. Notice that ADWORD_SEARCH is one of the most important features:

When I zeroed out ADWORD_SEARCH feature, I got practically identical prediction, which can not be the case:

with model:
    y_pred = sample_posterior_predictive(trace)
    
mod_channel = 'ADWORDS_SEARCH'
df_mod = df_train.copy(deep=True)
df_mod.iloc[12:-12, df_mod.columns.get_loc(mod_channel)] = 0

with model:
    pm.set_data({'features':df_mod})
    y_pred_mod = pm.sample_posterior_predictive(trace)

By zeroeing out ADWORD_SEARCH, I would expect that the prediction would be significantly lower than the original prediction since ADWORD_SEARCH is one of the most important features according to the betas.

I started questioning the model, but it seems to perform well:

MAPE = 6.3%
r2 = 0.7

I also tried passing in the original training data set to pm.setdata() and I got very similar results as well.

This is difference between prediction from training data and new data:

This is the difference between prediction from training data and the same training data using pm.setdata():
image

Anyone know what I’m doing wrong?

Welcome!

Maybe it’s this? From the API docs:

Since v4.1.0 the default value is mutable=False, with previous versions having mutable=True.

In general, I think the idiomatic approach is to use either pm.MutableData or pm.ConstantData to avoid the ambiguity associated with pm.Data.

1 Like

Thank you for getting back! I tried passing the mutable=True argument, but got this error. I’m using the newest version of pymc3 (3.11.5) so I’m not sure why I got this error. pm.MutableData() also didn’t work.



image

I would strongly recommend upgrading to version 4 if possible (current version is 4.4). Installation instructions are here.

1 Like