Pymc5 out of sample

Thank you, Jessie. After several weeks, I managed to pick up the project again, and thanks to your guidance, I understood where the error was in the model’s definition. Now it’s working properly! Thanks. Unfortunately, I find myself back in the initial situation now, and I continue to experience an error during the out-of-sample phase.

Here the graph

Here the model code:

features = ['feature1', 'feature2']
control_vars = ['seasonal','dummy'] 
target = df_new['y'].to_numpy()
df_features  = df_new[features]
n_obs, n_features = df_features.shape
coords = {'features':features, 
          'all_vars':features}

if control_vars:
    df_controls  = df_new[control_vars]
    _, n_controls = df_controls.shape
    coords.update({'controls':control_vars,
                   'all_vars':features + control_vars})

with pm.Model(coords=coords) as basic_model:
    X = pm.MutableData('feature_data', df_features.values)
    y = pm.MutableData('targets', df_new['y'].values.squeeze())

    n_obs = X.shape[0]
    
    betas = pm.HalfNormal('beta', sigma = 2, dims=['features'])
    decays = pm.Beta('decay', alpha=3, beta=3, dims=['features'])
    sat = pm.Gamma('sat', alpha=3, beta=1, dims=['features'])
    contributions = []
    
    for i in range(n_features):
        x = logistic_function(geometric_adstock_tt(X[:, i], decays[i]),sat[i])*betas[i]
        contributions.append(x)
        
    if n_controls > 0:
        Z = pm.MutableData('control_data', df_controls.values)
        control_betas = pm.Normal('control_beta', sigma = 2, dims=['controls'])
        
        for w in range(n_controls):
            z = Z[:,w]*control_betas[w]
            contributions.append(z)
    
    mu = pm.Deterministic("contributions", tt.stack(contributions).T, dims=['all_vars'])
    sigma = pm.HalfNormal('sigma', sigma=1)
    
    y_hat = pm.Normal("y_hat", mu=mu.sum(axis=-1), sigma=sigma, observed=y, shape=X.shape[0])
    


with basic_model:
    # draw 1000 posterior samples
    idata = pm.sample(idata_kwargs={'dims':{'contributions':[None, 'all_vars']}})
    #idata = pm.sample(return_inferencedata=True, tune= 1000)   
    
with basic_model:
    post = pm.sample_posterior_predictive(idata)

And it works :slight_smile:

Then the out of sample code:

x_test = df_x_test.values.astype(np.float64)
z_test = df_z_test.values.astype(np.float64)
y_test = test_target

with basic_model_2:
    pm.set_data({"X":x_test, "y":y_test, "Z":z_test})
    idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True, predictions=True)

but I get this error :frowning:

Traceback (most recent call last):

  File ~\Anaconda3\envs\pymc5_spyder\lib\site-packages\pymc\model.py:1602 in __getitem__
    return self.named_vars[self.name_for(key)]

KeyError: 'X'


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  Cell In[26], line 2
    pm.set_data({"X":x_test, "y":y_test, "z":z_test})

  File ~\Anaconda3\envs\pymc5_spyder\lib\site-packages\pymc\model.py:2048 in set_data
    model.set_data(variable_name, new_value, coords=coords)

  File ~\Anaconda3\envs\pymc5_spyder\lib\site-packages\pymc\model.py:1177 in set_data
    shared_object = self[name]

  File ~\Anaconda3\envs\pymc5_spyder\lib\site-packages\pymc\model.py:1604 in __getitem__
    raise e

  File ~\Anaconda3\envs\pymc5_spyder\lib\site-packages\pymc\model.py:1599 in __getitem__
    return self.named_vars[key]

KeyError: 'X'

The test objects have the same shape as those used in the main model.
It seems like an error, possibly related to labels, but I’m having trouble pinpointing the issue

My ultimate goal is to obtain a dataframe with out-of-sample predictions not only for y_hat but also for the features and control variables.

Thank-you in advance for your help!