Hi, I’m facing an issue when using set_data for posterior prediction with new data. I split my dataset in an usual train and test set. I do set the new dimensions coordinates for the test data but I get an error stating that it still expect the dimensions of the train data.
Here is my model which is I think pretty simple:
with pm.Model() as batting_score_model:
batting_score_model.add_coord('observations', np.arange(X_train.shape[0]), mutable = True)
batting_score_model.add_coord('predictors', X_train.columns.values, mutable = True)
# Define X_train as a 2D data container with dimensions for observations and predictors
X_data = pm.Data("X_data", X_train.values, dims=('observations','predictors'), mutable=True)
# Prior on error SD
sigma = pm.HalfNormal("sigma", 25)
# Global shrinkage prior
tau = pm.HalfStudentT("tau", 2, D0 / (D - D0) * sigma / np.sqrt(N))
# Local shrinkage prior
lam = pm.HalfStudentT("lam", 5, dims="predictors")
c2 = pm.InverseGamma("c2", 1, 1)
z = pm.Normal("z", 0.0, 1.0, dims="predictors")
# Shrunken coefficients
beta = pm.Deterministic(
"beta", z * tau * lam * pt.sqrt(c2 / (c2 + tau**2 * lam**2)), dims="predictors"
)
# No shrinkage on intercept
beta0 = pm.Normal("beta0", 100, 25.0)
# Model mean
mu = pm.Deterministic("mu", beta0 + pt.dot(X_data, beta))
# Likelihood
batting_scores = pm.Normal("batting_scores", mu=mu, sigma=sigma, observed=y_train.values, dims='observations')
And below I set the new data for posterior sampling:
with batting_score_model:
# Update the model with X_test values for predictions
pm.set_data({"X_data": X_test.values}, coords={"observations": np.arange(X_test.shape[0]),"predictors":X_test.columns.values})
# Sample from the posterior predictive distribution
posterior_predictions = pm.sample_posterior_predictive(
trace, var_names=["batting_scores", "mu"]
)
but i’m getting
ValueError: conflicting sizes for dimension ‘observations’: length 7005 on the data but length 1752 on coordinate ‘observations’
Thanks for helping!