How do I predict on new, unseen real data using pm.sample_posterior_predictive?

Schimidt99 · December 22, 2020, 9:59pm

Hello! I am trying to do a simple multivariate regression using bayesian modeling. I am using real data from a CSV table. I am able to set up the model and sample from posterior, but I am confused with how to actually generate new predictions from new Xi data.

My training data have one Y (output) and 10 Xi input (i = 1 to 10). All X predictors are standardize.

I specified the parameters:

dY : Y output data

dX1 : 1st X column data
dX2 : 2nd X column data
…
dX10 : 10th X column data

My model:

with pm.Model() as model:
a = pm.Normal('a', mu=dY.mean(), sd=10)
B = pm.Normal('B', mu=0, sd=10, shape=10) 
sigma = pm.Uniform('sigma', lower=0, upper=10) 
mu = pm.Deterministic('mu', a + B[0] * dX1 + B[1] * dX2 + B[2] * dX3 + B[3] * dX4 + 
                      B[4] * dX5 + B[5] * dX6 + B[6] * dX7 + B[7] * dX8 + B[8] * dX9 + B[9] * X10)

Y = pm.Normal('Y', mu=mu, sd=sigma, observed=dY)
trace = pm.sample(1000, tune=1000)

When I use:

> Y_pred = pm.sample_posterior_predictive(trace, samples=1000, model=model)['Y']

I have all the Y_pred values generated by the model from the Xi original data.

If I wanted to predict new Y values from new Xi parameters? How should I use pm.sample_posterior_predictive?

cluhmann · December 23, 2020, 3:43pm

You can use set_data() to swap out the data you used for inference for something new (e.g., out-of-sample test data) before running sample_posterior_predictive. That will allow you to use your estimated model parameters to generate predictions about your outcome (i.e., Y in your case) in a new scenario (i.e., for new values of dX1, dX2, etc.).

This notebook may be of additional use to you.

[Edit: documentation links updated]

ccaprani · December 24, 2020, 2:15am

Thanks for this. I am working on preposterior analysis and was trying to figure out dealing with hypothetical data for value of information analysis using pymc3.

Isa it always good practice to use pm.Data to make the model data-aware? Don’t see it being done much though.

AlexAndorra · December 24, 2020, 9:48am

Hi Colin,
Yep, it’s usually the first thing to try. There definitely are limitations to the Data container, but being able to use it makes everything easier.
You can take at look this notebook for an introduction, and at this one for many examples.
Hope this helps

ccaprani · December 24, 2020, 9:56am

Fantastic - thanks @AlexAndorra, et joyeux noël!

AlexAndorra · December 24, 2020, 10:42am

Ha ha thanks, you too – and thanks for the support on the podcast

cluhmann · December 24, 2020, 7:40pm

My own use of the data container strongly depends on the model, the data, and the overall scenario. For simple models/data set-ups and/or when I am generating posterior predictions for quick diagnostic purposes, I often just plug samples into a “new” instantiation of the model. But as things get more complex, the data container starts to be much more convenient because you can re-use the model you already implemented.

Schimidt99 · December 24, 2020, 11:28pm

Thanks for your answer cluhmann!!
But when I tested your example at my Spyder, I have the alert:

AttributeError: module ‘pymc3’ has no attribute ‘set_data’

What happen? Is it a PyMC3 version problem?
Thank you very much!!

cluhmann · December 25, 2020, 12:37am

Possible, though it seems unlikely. What version of pymc3 are you using? And can you provide a snippet of code where set_data() fails for you?

Schimidt99 · January 4, 2021, 6:49pm

Hello cluhmann, sorry for the delay! I am using the same example that appears at the link: “https://docs.pymc.io/api/model.html#pymc3.model.set_data”:

import pymc3 as pm
print(f"Running on PyMC3 v{pm.__version__}")

with pm.Model() as model:
    x = pm.Data('x', [1., 2., 3.])
    y = pm.Data('y', [1., 2., 3.])
    beta = pm.Normal('beta', 0, 1)
    obs = pm.Normal('obs', x * beta, 1, observed=y)
    trace = pm.sample(1000, tune=1000)
    
        
with model:
    pm.set_data({'x': [5., 6., 9.]})
    y_test = pm.sample_posterior_predictive(trace)
    y_test['obs'].mean(axis=0)

The output shows:

Running on PyMC3 v3.6

File "C:/Backup_Fernando/DeepLearning/Spyder/teste_set_data2.py", line 14, in <module>
    x = pm.Data('x', [1., 2., 3.])

AttributeError: module 'pymc3' has no attribute 'Data'

How can I fix this?
Thanks a lot for any help!!

cluhmann · January 5, 2021, 10:19pm

Hm. 3.6 is now 2 years old, so it might actually be old enough to not include the pm.Data/pm.set_data() functionality. Unless you have some particular reason not to, I would update (or install a fresh copy of 3.10 in a new virtual environment).

Schimidt99 · January 6, 2021, 6:21pm

Thank you very much Christian!!
I will update my enviroment!

Schimidt99 · January 6, 2021, 9:49pm

Dear Christian, do you have any idea to use the pm.set_data() with data frame table?
Thank you!

cluhmann · January 7, 2021, 12:20am

A pandas dataframe? Something like this should work:

    pm.set_data( {'my_observed_variable': df['my_column'].to_numpy()} )

Topic		Replies	Views
How do I predict on new, unseen data using GLM? Questions	3	1405	July 13, 2020
Using pm.Data to predict on two inputs for sample_posterior_predictive; why is there no change in the results? Questions	4	1158	May 17, 2021
Question generating posterior predictive samples with Latent GP	2	326	July 29, 2023
Sample posterior predictive error Questions	3	1190	October 8, 2019
Dealing with multivariate x, errors in both x and y, and posterior predictive for y Questions	9	3119	March 14, 2018

How do I predict on new, unseen real data using pm.sample_posterior_predictive?

Related topics