Get out-of-sample posterior predictive for mean

I am playing around with using BART to predict horse speed (y_pred) using some old horse racing data. Here is my model:

with pm.Model() as model_jockey:
    X = pm.MutableData("X",x_train)
    Y = y_data
    mu = pmb.BART("mu", X=X, Y=Y, m=10)  ## up this to at least ?100? for production
    sigma = pm.HalfNormal("sigma",sigma=0.25)
    y_pred = pm.StudentT("y_pred", mu=mu, sigma=sigma, nu=2, observed=Y, shape = X.shape[0])
    idata_jockey = pm.sample(random_seed=RANDOM_SEED) #, initvals = initial_values)

I am able to predict y_pred for out of sample data using this code:

with model_jockey:
    pm.set_data({"X": x_test})
    ppc2 = pm.sample_posterior_predictive(
        trace=idata_jockey, random_seed=RANDOM_SEED,
        extend_inferencedata=True, predictions = True
    )

But for the life of me, I cannot figure out how get out-of-sample predictions for “mu”. How can I get the “mu” values used to sample from the posterior predictive when generating the y_pred values in the PPC?

1 Like

Welcome!

Would this work (it might not, I’m doing this from memory)?

ppc2 = pm.sample_posterior_predictive(
        trace=idata_jockey, random_seed=RANDOM_SEED,
        extend_inferencedata=True, predictions = True,
        var_names=["mu"]
    )
1 Like

Almost. I got an error due to extend_inference I think. This works:

with model_jockey:
    pm.set_data({"X": x_test})
    ppc2 = pm.sample_posterior_predictive(
        trace=idata_jockey, var_names = ["mu"],
        random_seed=RANDOM_SEED,
        predictions = True
    )

I started reading this excellent documentation here (should have found it earlier): Out of model predictions with PyMC - PyMC Labs

Thanks for the help!!

2 Likes