Model scoring on out-of-samples predictions

Ansul · November 20, 2023, 2:12pm

hi,
I am trying to score a pymc model on different datasets. Here is how I am generating the trace -

with model_factory(df):
    trace = pm.sampling.jax.sample_numpyro_nuts(
            draws=1000,
            tune=1000,
            chains=4,
            random_seed=1111,
            target_accept=0.99
    )

I need to sum up all the values of obs variable after taking mean. Here is how I am feeding the test dataset. Is this the right way to get predictions on test datasets

with model_factory(df):
    pm.set_data({"input_data": test_df})
    trace_new = pm.sample_posterior_predictive(trace)
trace_new.posterior_predictive['obs'].mean(dim=["chain", "draw"]).sum().values

jessegrabowski · November 26, 2023, 2:07pm

Yes, this looks correct. But it’s a bit of a waste to reduce your posteriors to a point estimate. If you’re going to do that, why bother running the expensive and time consuming MCMC algorithm in the first place?

You should consider computing a model score for each posterior draw, and look at the posterior distributions over the scoring functions.

Topic		Replies	Views
Model scoring in pymc	1	285	September 11, 2023
Why were the observed values in the out-of-sample prediction the true values of the training set, rather than the true values of the test set? v5 modeling , arviz	5	169	July 26, 2024
Prediction concept in Pymc3 Questions	0	369	September 7, 2021
Example for out-of-sample prediction with posterior predictive sampling v5	8	3194	October 28, 2022
Predictions with PYMC, Dont work on test data set modeling	3	57	January 18, 2025

Model scoring on out-of-samples predictions

Related topics