How to properly do out-of-sample prediction for hierarchical model

What you were doing was in the right direction. Just the indexing part seemed to be wrong that’s all. Here is how I would think it would work:

# Predict two new groups
new_idx = [1, 1, 2, 3]
new_coords = {
    "id":[0, 1],
    "new_id": [2, 3],
    "obs_idx": list(range(len(new_idx))),
}
with pm.Model(coords=new_coords) as pred_m:
    x = pm.Data("x", [0, 0, 0, 0], dims=("obs_idx",))
    mu_group = pm.Normal("mu_group")
    # Needs a new name, so old b is not used!
    new_b = pm.Normal("new_b", mu_group, dims=("new_id",))
    b = pm.Normal("b", mu_group, dims=("id",))
    mu = pm.math.concatenate([b, new_b])[new_idx] * x
    obs = pm.Normal("obs", mu, dims=("obs_idx",))

    idata = pm.sample_posterior_predictive(idata, var_names=["obs"], predictions=True, extend_inferencedata=True)

The only line I changed was mu = pm.math.concatenate([b, new_b])[new_idx] * x because you were doing invalid index with new_idx.

The concatenate may be wrong how I did it, but the ideas is to get all the b in a single vector.

1 Like