Computing Prediction Intervals without sampling

nlassaux · May 15, 2018, 9:20pm

Let’s assume we have a model similar to the following definition:

with pm.Model() as model:
    coeff = pm.Normal('coeff', mu=0, sd=1)
    var = pm.Normal('var', mu=observed_var.mean(), sd=observed_var.stds(), observed=observed_var)
    sd = pm.HalfNormal('sd', sd=1)
    outcome = pm.Normal('outcome', mu=coeff*var, sd=sd, observed=outcomes)
    trace = pm.sample(5000)

For now, I use the mean of each distribution after sampling from the trace (coeff and sd) in order to derivate the linear model for production. Then I just have to multiply in this case by the mean of coeff.

How would you compute the error from that, without having to persist the trace. Knowing that the sd of outcome depends on mu.

narendramukherjee · May 16, 2018, 2:10am

I am not sure if the model specification is correct in the first place. sd can attain negative values - you probably want to bound that normal distribution if you want to continue using it for sd.

In reply to your question, couldn’t you save the 95% HPD for coeff and sd just like you are saving the means? Then the same operation that you perform using the mean of coeff and sd should give you the posterior uncertainty if used on the HPD interval limits.

junpenglao · May 16, 2018, 4:43am

@narendramukherjee’s solution might work if your posterior does not have strong correlation. Otherwise, you should save the trace as the correct computation use all the information from the posterior (this applies to all the cases where you are interested in the expectation of some function in the space of the posterior). Plus if you have quality samples, a small model like this 500 is sufficient so you dont actually have to save a lot of samples

nlassaux · May 16, 2018, 1:48pm

@narendramukherjee Sorry for the non constrained sd, it was an example code, I put it as halfnormal.

Some variables are, unfortunately, (too) correlated. I actually run it with 1000, there are 30 features in total and the dataset usually contains around 700 datapoints. I have the feeling correlations would make me overestimate the size of the posterior confidency interval.

@junpenglao The reason I didn’t want to use sample_ppc for that, was the use of a microservice for delivering predictions, where PYMC3 would probably be too massive. Moreover the fact that it is trained many times (there are in total 100k+ models, or traces).

Thought about going frequencist for the range estimation, by computing the standard error on a sample out-of-training and assuming a normal distribution of the residuals.

junpenglao · May 16, 2018, 2:41pm

frequencist model + bootstrap should work quite well as well in that case.

Topic		Replies	Views
The 95% interval of PyMC3 trace doesn't cove the real values Questions	8	1830	July 21, 2017
Posterior predictive sampling with data variance Questions	10	2397	September 14, 2018
Sample_posterior_predictive dependent variable vs deterministic dependent variable Questions	4	1093	February 11, 2020
Uncertainty of Model Predictions Questions	4	1996	April 10, 2018
Computing Credible Intervals for Predictions Questions	0	1539	August 12, 2021

Computing Prediction Intervals without sampling

Related topics