"shape mismatch" when new data is set as a predictor for sample_posterior_predictive

With a simple linear regression, pymc.sample_posterior_predictive returns error:

shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (100,) and arg 1 with shape (20,).

I used 100 data points as predictor for modeling, and set 20 data points as new predictor for predictions. Why does this happen?

MacOS: BigSur (11.7)
pymc: 4.4.0

Code is something like this:


with pm.Model() as model:

    sigma = pm.HalfCauchy('sigma', beta=10)
    intercept = pm.Normal('intercept', 80, sigma=50)
    beta = pm.Normal('beta', 0, sigma=50)

    X = pm.MutableData('X', data['A'], dims='obs_id')  # data['A']'s shape is (100,)
    mu = intercept + X * beta
    
    pm.Normal('y', mu, sigma=sigma, observed=data['y'])
    
    mcmc_result = pm.sample(
        draws=2000,
        chains=4,
        random_seed=123,
    )

new_data = range(11, 31)  # this has 20 data points

with model:

    model.set_data('X', new_data, coords={'obs_id': range(len(new_data))})

    y_pred = pm.sample_posterior_predictive(
        mcmc_result,
        var_names=['y'],
        return_inferencedata=True,
        predictions=True,
        random_seed=123,
    )

I think you have to use model.set_dim if you modify the length of the dimension with coordinate values assigned to it.

Refitting PyMC models with ArviZ — ArviZ dev documentation is not specifically about this but it does this among other things, it might also be helpful

1 Like

@OriolAbril Thank you for your help. I changed my code based on the linked page provided, and it seems to work. In my case, the minimum modification required was just adding the same dimension name to ‘y’.

pm.Normal('y', mu, sigma=sigma, observed=data['y'], dims='obs_id')