Setting new data for predictions, conflicting size with dims

ArnoSixer · November 8, 2024, 3:29pm

Hi, I’m facing an issue when using set_data for posterior prediction with new data. I split my dataset in an usual train and test set. I do set the new dimensions coordinates for the test data but I get an error stating that it still expect the dimensions of the train data.

Here is my model which is I think pretty simple:

with pm.Model() as batting_score_model:
    batting_score_model.add_coord('observations', np.arange(X_train.shape[0]), mutable = True)
    batting_score_model.add_coord('predictors', X_train.columns.values, mutable = True)
    # Define X_train as a 2D data container with dimensions for observations and predictors
    X_data = pm.Data("X_data", X_train.values, dims=('observations','predictors'), mutable=True)
    
    # Prior on error SD
    sigma = pm.HalfNormal("sigma", 25)
    
    # Global shrinkage prior
    tau = pm.HalfStudentT("tau", 2, D0 / (D - D0) * sigma / np.sqrt(N))
    # Local shrinkage prior
    lam = pm.HalfStudentT("lam", 5, dims="predictors")
    c2 = pm.InverseGamma("c2", 1, 1)
    z = pm.Normal("z", 0.0, 1.0, dims="predictors")
    # Shrunken coefficients
    beta = pm.Deterministic(
        "beta", z * tau * lam * pt.sqrt(c2 / (c2 + tau**2 * lam**2)), dims="predictors"
    )
    # No shrinkage on intercept
    beta0 = pm.Normal("beta0", 100, 25.0)
    
    # Model mean
    mu = pm.Deterministic("mu", beta0 + pt.dot(X_data, beta))
    
    # Likelihood
    batting_scores = pm.Normal("batting_scores", mu=mu, sigma=sigma, observed=y_train.values, dims='observations')

And below I set the new data for posterior sampling:

with batting_score_model:
    # Update the model with X_test values for predictions
    pm.set_data({"X_data": X_test.values}, coords={"observations": np.arange(X_test.shape[0]),"predictors":X_test.columns.values})
    
    # Sample from the posterior predictive distribution
    posterior_predictions = pm.sample_posterior_predictive(
        trace, var_names=["batting_scores", "mu"]
    )

but i’m getting

ValueError: conflicting sizes for dimension ‘observations’: length 7005 on the data but length 1752 on coordinate ‘observations’

Thanks for helping!

cluhmann · November 8, 2024, 4:10pm

When you change a dimension of the model (e.g., "observations" in this case), you need to make sure that all the pieces of the model that were originally associated with this dimension. You have changed "X_data", but you have the observed batting_scores which are also associated with the "observations" dim, but still the original shape and thus conflict. You are generating new batting_scores via posterior predictive, so you can just replace the original observations with dummy values. Just as long as it matches the new shape/dimension.

ArnoSixer · November 8, 2024, 7:25pm

Thank you very much Cluhmann, this worked and helped me understand why!

ricardoV94 · November 8, 2024, 9:34pm

The other option (documented in set_data) is to define how the shape of the observed variable depends on inputs or use and update dims. In either case no dummy data is needed.

Usually something like shape=mu.shape is all that’s needed. By default observed variables have shape=observed.shape which is a shortcut for the user but then bites in posterior predictive with new dims.

Topic		Replies	Views
Pm.set_data in 4.2.0 v5 modeling	4	614	September 26, 2022
Help with Out of Sample Predictions	12	601	August 24, 2023
Pm.set_data throws error v5 bug	2	405	March 5, 2023
Predict with new coords leads to conflicting sizes v5	5	1257	October 12, 2022
Using pm.Data to predict on two inputs for sample_posterior_predictive; why is there no change in the results? Questions	4	1099	May 17, 2021

Setting new data for predictions, conflicting size with dims

Related topics