How to make predictions on new data without known target values

LLehner · January 19, 2026, 11:27pm

Given a trace of a fitted model, i would like to make predictions on new data using the posterior predictive function. It’s a regression problem, using a GAM. However in all examples [spline example] [classification example] i’ve seen, sampling from the posterior predictive function requires to provide values for the target variable Y, which i don’t have.

Is there a way to get predictions for arbitrary explanatory variable values without the need to construct any target values? I just want to predict Y given X. Just multiplying the posterior estimates of the parameters with the data doesn’t seem right, or is that the way to do it?

jessegrabowski · January 20, 2026, 12:37am

You don’t need targets for out of sample prediction. You just need to use pm.set_data to change the input data, then call pm.sample_posterior_predictive. The tutorial on pm.data might be helpful.

Maria_8978 · January 20, 2026, 4:30am

You’re overanalyzing the Y requirement; it has nothing to do with the need for goals at prediction time; it’s simply the way the model graph was constructed.

In PyMC, posterior predictive sampling just requires the symbolic variable to be present in the graph; it does not “use” observed Y values for fresh data. It may seem counterintuitive, but examples still determine Y because when you transition to a new X, the probability node is assessed forward using the sampled parameters rather than conditioned on Y.

The proper procedure is:

X should be defined as shared data (pm.Data).

Fit the GAM once.

change X using pm.set_data to reflect the new values.

Sample_posterior_predictive is called.

No phony Y structure, no false targets. Y will be automatically generated by PyMC from the posterior predictive distribution.

Indeed, it is incorrect to manually multiply posterior means by X in this situation since it ignores the GAM’s nonlinear spline structure and compresses uncertainty. The goal of posterior predictive analysis is to spread noise and parameter uncertainty.

The simple answer is that you can definitely predict Y given just X, which is precisely what posterior predictive analysis is for. Code that “requires” Y is not a logical necessity; rather, it is a modeling artifact.

big_data_mike · January 20, 2026, 2:40pm

Here are some snippets from some code I have run:

X3 is my input data and y_log is the data I am predicting.


with pm.Model() as model1:




    X_data = pm.Data("X_data", X3)

    y_data = pm.Data("y_data", y_log)

    mu = pmb.BART('mu',

        X=X_data,

        Y=y_data,

        m=200,

    )

    sigma = pm.HalfNormal('sigma', sigma=1)

    y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y_data)





    trace = pm.sample(2000,tune=2000,chains=4,compute_convergence_checks=False)

This predicts the within sample:

with model1:

    post_pred = pm.sample_posterior_predictive(trace, var_names=['mu'])

y_pred = post_pred.posterior_predictive['mu'].mean(dim=('chain','draw')).to_dataframe()['mu']

You can then make any new dataframe you want for X as long as it has the same columns in the same order (I use pandas dataframes. I’m not sure if the names have to be exactly the same but it’s probably best that they are.) In my case I am using a latin hypercube called X_lhs:

with model1:

    pm.set_data({"X_data": X_lhs})

    post_pred_oos = pm.sample_posterior_predictive(trace, var_names=['mu'], predictions=True, progressbar=False)

y_lhs = post_pred_oos.predictions.mean(dim=['chain','draw']).to_dataframe()['mu']

Topic		Replies	Views
Sampling plausible posterior x parameter values in linear model for given y version agnostic	6	131	July 30, 2024
How do I predict on new, unseen real data using pm.sample_posterior_predictive? Questions	13	8687	January 7, 2021
How to get posterior predictive distribution sample data for a single prediction? v5	1	202	November 5, 2024
Prediction concept in Pymc3 Questions	0	379	September 7, 2021
Using sample posterior predictive on new data v5 modeling	4	162	April 30, 2025

How to make predictions on new data without known target values

Related topics