Setting new data for predictions, conflicting size with dims

Hi, I’m facing an issue when using set_data for posterior prediction with new data. I split my dataset in an usual train and test set. I do set the new dimensions coordinates for the test data but I get an error stating that it still expect the dimensions of the train data.

Here is my model which is I think pretty simple:

with pm.Model() as batting_score_model:
    batting_score_model.add_coord('observations', np.arange(X_train.shape[0]), mutable = True)
    batting_score_model.add_coord('predictors', X_train.columns.values, mutable = True)
    # Define X_train as a 2D data container with dimensions for observations and predictors
    X_data = pm.Data("X_data", X_train.values, dims=('observations','predictors'), mutable=True)
    
    # Prior on error SD
    sigma = pm.HalfNormal("sigma", 25)
    
    # Global shrinkage prior
    tau = pm.HalfStudentT("tau", 2, D0 / (D - D0) * sigma / np.sqrt(N))
    # Local shrinkage prior
    lam = pm.HalfStudentT("lam", 5, dims="predictors")
    c2 = pm.InverseGamma("c2", 1, 1)
    z = pm.Normal("z", 0.0, 1.0, dims="predictors")
    # Shrunken coefficients
    beta = pm.Deterministic(
        "beta", z * tau * lam * pt.sqrt(c2 / (c2 + tau**2 * lam**2)), dims="predictors"
    )
    # No shrinkage on intercept
    beta0 = pm.Normal("beta0", 100, 25.0)
    
    # Model mean
    mu = pm.Deterministic("mu", beta0 + pt.dot(X_data, beta))
    
    # Likelihood
    batting_scores = pm.Normal("batting_scores", mu=mu, sigma=sigma, observed=y_train.values, dims='observations')

And below I set the new data for posterior sampling:

with batting_score_model:
    # Update the model with X_test values for predictions
    pm.set_data({"X_data": X_test.values}, coords={"observations": np.arange(X_test.shape[0]),"predictors":X_test.columns.values})
    
    # Sample from the posterior predictive distribution
    posterior_predictions = pm.sample_posterior_predictive(
        trace, var_names=["batting_scores", "mu"]
    )

but i’m getting

ValueError: conflicting sizes for dimension ‘observations’: length 7005 on the data but length 1752 on coordinate ‘observations’

Thanks for helping!

When you change a dimension of the model (e.g., "observations" in this case), you need to make sure that all the pieces of the model that were originally associated with this dimension. You have changed "X_data", but you have the observed batting_scores which are also associated with the "observations" dim, but still the original shape and thus conflict. You are generating new batting_scores via posterior predictive, so you can just replace the original observations with dummy values. Just as long as it matches the new shape/dimension.

Thank you very much Cluhmann, this worked and helped me understand why!

The other option (documented in set_data) is to define how the shape of the observed variable depends on inputs or use and update dims. In either case no dummy data is needed.

Usually something like shape=mu.shape is all that’s needed. By default observed variables have shape=observed.shape which is a shortcut for the user but then bites in posterior predictive with new dims.