Hey @ricardoV94 I am facing a similar issue. I have a model where my observed outcome variable has missing data. I would like to automatically impute the data using the sampling distribution. I have a pymc model that looks like this:
with pm.Model(coords=coords) as STS:
time = pm.Data("time", train_time_scaled)
ft = pm.Data("fourier_series", train_fourier_series.to_numpy())
k0 = pm.Normal("k0", 0, 1)
m = pm.Normal("m", 0, 1)
b = pm.Normal("b", 0, 1, dims="fourier_coef")
seasonal = pm.Deterministic("seasonal", pm.math.dot(ft, b))
mu = np.linspace(0.1, 0.8, len(coords['changepoints']))
s = pm.Data("s", mu, dims="changepoints")
delta = pm.Laplace("delta", mu=0, b=0.2, dims="changepoints")
a = pm.Deterministic("a", pm.math.where(time[:, None] < s[None, :], np.array(0.), np.array(1.)))
gamma = pm.Deterministic("gamma", -s*delta, dims="changepoints")
func = (k0 + pm.math.dot(a, delta))*time + (m + pm.math.dot(a, gamma))
trend = pm.Deterministic("trend", func)
error = pm.HalfNormal("error", sigma=1)
pm.Normal(
"likelihood",
mu=trend + seasonal,
sigma=error,
observed=train_xs_scaled
)
I am passing train_xs_scaled directly because it has missing values so I am not able to use pm.Data()
Now I want to produce OOS forecasts using this model. For that I have:
with STS:
pm.set_data(
new_data=dict(
time=test_time_scaled,
fourier_series=test_fourier_series
)
)
predictions = pm.sample_posterior_predictive(idata)
But this results in an error:
IndexError: boolean index did not match indexed array along dimension 0; dimension is 49 but corresponding boolean
dimension is 432
Where the training examples have a length of 432 and the testing examples have a dimension of 49. Is there any way I can modify the model so that it does not use the old observed data for the ‘likelihood’ since they should not be needed anyway to produce the forecast?