Thank you, Jessie. After several weeks, I managed to pick up the project again, and thanks to your guidance, I understood where the error was in the model’s definition. Now it’s working properly! Thanks. Unfortunately, I find myself back in the initial situation now, and I continue to experience an error during the out-of-sample phase.
Here the graph
Here the model code:
features = ['feature1', 'feature2']
control_vars = ['seasonal','dummy']
target = df_new['y'].to_numpy()
df_features = df_new[features]
n_obs, n_features = df_features.shape
coords = {'features':features,
'all_vars':features}
if control_vars:
df_controls = df_new[control_vars]
_, n_controls = df_controls.shape
coords.update({'controls':control_vars,
'all_vars':features + control_vars})
with pm.Model(coords=coords) as basic_model:
X = pm.MutableData('feature_data', df_features.values)
y = pm.MutableData('targets', df_new['y'].values.squeeze())
n_obs = X.shape[0]
betas = pm.HalfNormal('beta', sigma = 2, dims=['features'])
decays = pm.Beta('decay', alpha=3, beta=3, dims=['features'])
sat = pm.Gamma('sat', alpha=3, beta=1, dims=['features'])
contributions = []
for i in range(n_features):
x = logistic_function(geometric_adstock_tt(X[:, i], decays[i]),sat[i])*betas[i]
contributions.append(x)
if n_controls > 0:
Z = pm.MutableData('control_data', df_controls.values)
control_betas = pm.Normal('control_beta', sigma = 2, dims=['controls'])
for w in range(n_controls):
z = Z[:,w]*control_betas[w]
contributions.append(z)
mu = pm.Deterministic("contributions", tt.stack(contributions).T, dims=['all_vars'])
sigma = pm.HalfNormal('sigma', sigma=1)
y_hat = pm.Normal("y_hat", mu=mu.sum(axis=-1), sigma=sigma, observed=y, shape=X.shape[0])
with basic_model:
# draw 1000 posterior samples
idata = pm.sample(idata_kwargs={'dims':{'contributions':[None, 'all_vars']}})
#idata = pm.sample(return_inferencedata=True, tune= 1000)
with basic_model:
post = pm.sample_posterior_predictive(idata)
And it works ![]()
Then the out of sample code:
x_test = df_x_test.values.astype(np.float64)
z_test = df_z_test.values.astype(np.float64)
y_test = test_target
with basic_model_2:
pm.set_data({"X":x_test, "y":y_test, "Z":z_test})
idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True, predictions=True)
but I get this error ![]()
Traceback (most recent call last):
File ~\Anaconda3\envs\pymc5_spyder\lib\site-packages\pymc\model.py:1602 in __getitem__
return self.named_vars[self.name_for(key)]
KeyError: 'X'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
Cell In[26], line 2
pm.set_data({"X":x_test, "y":y_test, "z":z_test})
File ~\Anaconda3\envs\pymc5_spyder\lib\site-packages\pymc\model.py:2048 in set_data
model.set_data(variable_name, new_value, coords=coords)
File ~\Anaconda3\envs\pymc5_spyder\lib\site-packages\pymc\model.py:1177 in set_data
shared_object = self[name]
File ~\Anaconda3\envs\pymc5_spyder\lib\site-packages\pymc\model.py:1604 in __getitem__
raise e
File ~\Anaconda3\envs\pymc5_spyder\lib\site-packages\pymc\model.py:1599 in __getitem__
return self.named_vars[key]
KeyError: 'X'
The test objects have the same shape as those used in the main model.
It seems like an error, possibly related to labels, but I’m having trouble pinpointing the issue
My ultimate goal is to obtain a dataframe with out-of-sample predictions not only for y_hat but also for the features and control variables.
Thank-you in advance for your help!
