Hi Ricardo. Thank you for your answer
Now it seems there’s a shape issue…
…maybe I’m getting lost in a glass of water…
I’ll present the entire code to you below.
control_vars = ['c1','c2', 'c3', 'c4', 'c5' ]
target = df_scaled['y'].to_numpy()
df_controls = df_scaled[control_vars]
n_obs, n_controls = df_controls.shape
coords = {'controls':control_vars,
'all_vars':control_vars}
with pm.Model(coords=coords) as model:
X = pm.MutableData('control_data', df_controls.values)
y = pm.MutableData('targets', df_scaled['y'].values.squeeze())
n_obs = X.shape[0]
contributions = []
control_betas = pm.Normal('control_beta', sigma = 2, dims=['controls'])
for w in range(n_controls):
x = X[:,w]*control_betas[w]
contributions.append(x)
mu = pm.Deterministic("contributions", tt.stack(contributions).T, dims=['controls'])
sigma = pm.HalfNormal('sigma', sigma=1)
y_hat = pm.Normal("y_hat", mu=mu.sum(axis=-1), sigma=sigma, observed=target, shape=X.shape[0])
with model:
idata = pm.sample(idata_kwargs={'dims':{'contributions':[None, 'controls']}})
And up to this point, everything is fine, then there’s the out-of-sample part.
x_test = df_test_controls.values
with model:
pm.set_data({"control_data":x_test})
idata.extend(pm.sample_posterior_predictive(idata, var_names=["y_hat", "contributions"]))
But I obtain the following error:
ValueError: conflicting sizes for dimension ‘controls’: length 110 on the data but length 5 on coordinate ‘controls’
the shapes are:
df_test_controls.values = Array of float64 (110,5)
df_controls.values = Array of float64 (110,5)
coords = Dict (5)
It seems there is a mismatch in dimensions. Perhaps a naming issue?
Thank you