How to properly do out-of-sample prediction for hierarchical model

ricardoV94 · January 25, 2024, 4:10am

Why did you say that can’t be right? You shouldn’t have to change RV dims that were already present in the original model, even if not all are used for posterior predictive

jaharvey8 · January 25, 2024, 4:21am

Ok, so “id” stays [0,1], but then new_idx must have to change to [1, 1, 2, 3] if we want to predict on one previous group (1) and two new groups (2, 3) correct?

So for example this code works

# Predict two new groups
new_idx = [1, 1, 2, 3]
new_coords = {
    "id":[0, 1],
    "new_id": [2, 3],
    "obs_idx": list(range(len(new_idx))),
}
with pm.Model(coords=new_coords) as pred_m:
    x = pm.Data("x", [0, 0, 0, 0], dims=("obs_idx",))
    mu_group = pm.Normal("mu_group")
    # Needs a new name, so old b is not used!
    new_b = pm.Normal("new_b", mu_group, dims=("new_id",))
    b = pm.Normal("b", mu_group, dims=("id",))
    mu = pm.math.concatenate([new_b[new_idx], b[idx]]) * x
    obs = pm.Normal("obs", mu, dims=("obs_idx",))

    idata = pm.sample_posterior_predictive(idata, var_names=["obs"], predictions=True, extend_inferencedata=True)

ricardoV94 · January 25, 2024, 12:41pm

Indexing should work just like numpy. You can either have zero-based indexing for both old and new groups (and ignore as many of either as you want), or you can have zero-based indexing for both groups, if you concatenate the new_b and b first and then index, instead of the other way around.

Your last example is odd, because you have a new_idx with values 1 to 3 on new_b which only has two entries. Sounds like you want the concatenate first and then index logic?

jaharvey8 · January 25, 2024, 1:46pm

I’m more confused than ever. How should I write the prediction model if I want to predict on 1 of the previous groups and 2 new groups?

ricardoV94 · January 25, 2024, 2:41pm

What you were doing was in the right direction. Just the indexing part seemed to be wrong that’s all. Here is how I would think it would work:

# Predict two new groups
new_idx = [1, 1, 2, 3]
new_coords = {
    "id":[0, 1],
    "new_id": [2, 3],
    "obs_idx": list(range(len(new_idx))),
}
with pm.Model(coords=new_coords) as pred_m:
    x = pm.Data("x", [0, 0, 0, 0], dims=("obs_idx",))
    mu_group = pm.Normal("mu_group")
    # Needs a new name, so old b is not used!
    new_b = pm.Normal("new_b", mu_group, dims=("new_id",))
    b = pm.Normal("b", mu_group, dims=("id",))
    mu = pm.math.concatenate([b, new_b])[new_idx] * x
    obs = pm.Normal("obs", mu, dims=("obs_idx",))

    idata = pm.sample_posterior_predictive(idata, var_names=["obs"], predictions=True, extend_inferencedata=True)

The only line I changed was mu = pm.math.concatenate([b, new_b])[new_idx] * x because you were doing invalid index with new_idx.

The concatenate may be wrong how I did it, but the ideas is to get all the b in a single vector.

Topic		Replies	Views
Forecasting hierarchical models with sample_posterior_predictive Questions	1	1247	August 14, 2019
Issues when trying to do out of sample prediction v5	2	141	April 9, 2024
Prediction using sample_ppc in Hierarchical model Questions from_github	6	4973	December 14, 2017
Out of sample/model predictions in hierarchical model with new observed data v5 modeling	9	257	October 16, 2024
Out of Sample Predictions on heirarchical regression model (PyMC - v5.0.1) v5	5	585	March 2, 2023

How to properly do out-of-sample prediction for hierarchical model

Related topics