I built the Bayesian regression using PyMC3 package. I’m trying to generate prediction using new data. I used the data container pm.Data()
to train the model with the training data, then passed the new data to pm.set_data()
before calling pm.sample_posterior_predictive()
. The prediction was what I would expect from the training data, not the new data.
Here’s my model:
df_train = df.drop(['Unnamed: 0', 'DATE_AT'], axis=1)
with Model() as model:
response_mean = []
x_ = pm.Data('features', df_train) # a data container, can be changed
t = np.transpose(x_.get_value())
# intercept
y = Normal('y', mu=0, sigma=6000)
response_mean.append(y)
# channels that can have DECAY and SATURATION effects
for channel_name in delay_channels:
i = df_train.columns.get_loc(channel_name)
xx = t[i].astype(float)
print(f'Adding Delayed Channels: {channel_name}')
c = coef.loc[coef['features']==channel_name, 'coef'].values[0]
s = abs(c*0.015)
if c <= 0:
channel_b = HalfNormal(f'beta_{channel_name}', sd=s)
else:
channel_b = Normal(f'beta_{channel_name}', mu=c, sigma=s)
alpha = Beta(f'alpha_{channel_name}', alpha=3, beta=3)
channel_mu = Gamma(f'mu_{channel_name}', alpha=3, beta=1)
response_mean.append(logistic_function(
geometric_adstock_tt(xx, alpha), channel_mu) * channel_b)
# channels that have SATURATION effects only
for channel_name in non_lin_channels:
i = df_train.columns.get_loc(channel_name)
xx = t[i].astype(float)
print(f'Adding Non-Linear Logistic Channel: {channel_name}')
c = coef.loc[coef['features']==channel_name, 'coef'].values[0]
s = abs(c*0.015)
if c <= 0:
channel_b = HalfNormal(f'beta_{channel_name}', sd=s)
else:
channel_b = Normal(f'beta_{channel_name}', mu=c, sigma=s)
# logistic reach curve
channel_mu = Gamma(f'mu_{channel_name}', alpha=3, beta=1)
response_mean.append(logistic_function(xx, channel_mu) * channel_b)
# continuous external features
for channel_name in control_vars:
i = df_train.columns.get_loc(channel_name)
xx = t[i].astype(float)
print(f'Adding control: {channel_name}')
c = coef.loc[coef['features']==channel_name, 'coef'].values[0]
s = abs(c*0.015)
if c <= 0:
control_beta = HalfNormal(f'beta_{channel_name}', sd=s)
else:
control_beta = Normal(f'beta_{channel_name}', mu=c, sigma=s)
channel_contrib = control_beta * xx
response_mean.append(channel_contrib)
# categorical control variables
for var_name in index_vars:
i = df_train.columns.get_loc(var_name)
shape = len(np.unique(t[i]))
x = t[i].astype('int')
print(f'Adding Index Variable: {var_name}')
ind_beta = Normal(f'beta_{var_name}', sd=6000, shape=shape)
channel_contrib = ind_beta[x]
response_mean.append(channel_contrib)
# noise
sigma = Exponential('sigma', 10)
# define likelihood
likelihood = Normal(outcome, mu=sum(response_mean), sd=sigma, observed=df[outcome].values)
trace = pm.sample(tune=3000, cores=4, init='advi')
Here’s the beta’s from the model. Notice that ADWORD_SEARCH is one of the most important features:
When I zeroed out ADWORD_SEARCH feature, I got practically identical prediction, which can not be the case:
with model:
y_pred = sample_posterior_predictive(trace)
mod_channel = 'ADWORDS_SEARCH'
df_mod = df_train.copy(deep=True)
df_mod.iloc[12:-12, df_mod.columns.get_loc(mod_channel)] = 0
with model:
pm.set_data({'features':df_mod})
y_pred_mod = pm.sample_posterior_predictive(trace)
By zeroeing out ADWORD_SEARCH, I would expect that the prediction would be significantly lower than the original prediction since ADWORD_SEARCH is one of the most important features according to the betas.
I started questioning the model, but it seems to perform well:
MAPE = 6.3%
r2 = 0.7
I also tried passing in the original training data set to pm.setdata()
and I got very similar results as well.
This is difference between prediction from training data and new data:
This is the difference between prediction from training data and the same training data using pm.setdata():
Anyone know what I’m doing wrong?