Computing Prediction Intervals without sampling

Let’s assume we have a model similar to the following definition:

with pm.Model() as model:
    coeff = pm.Normal('coeff', mu=0, sd=1)
    var = pm.Normal('var', mu=observed_var.mean(), sd=observed_var.stds(), observed=observed_var)
    sd = pm.HalfNormal('sd', sd=1)
    outcome = pm.Normal('outcome', mu=coeff*var, sd=sd, observed=outcomes)
    trace = pm.sample(5000)

For now, I use the mean of each distribution after sampling from the trace (coeff and sd) in order to derivate the linear model for production. Then I just have to multiply in this case by the mean of coeff.

How would you compute the error from that, without having to persist the trace. Knowing that the sd of outcome depends on mu.

I am not sure if the model specification is correct in the first place. sd can attain negative values - you probably want to bound that normal distribution if you want to continue using it for sd.

In reply to your question, couldn’t you save the 95% HPD for coeff and sd just like you are saving the means? Then the same operation that you perform using the mean of coeff and sd should give you the posterior uncertainty if used on the HPD interval limits.

@narendramukherjee’s solution might work if your posterior does not have strong correlation. Otherwise, you should save the trace as the correct computation use all the information from the posterior (this applies to all the cases where you are interested in the expectation of some function in the space of the posterior). Plus if you have quality samples, a small model like this 500 is sufficient so you dont actually have to save a lot of samples

@narendramukherjee Sorry for the non constrained sd, it was an example code, I put it as halfnormal.

Some variables are, unfortunately, (too) correlated. I actually run it with 1000, there are 30 features in total and the dataset usually contains around 700 datapoints. I have the feeling correlations would make me overestimate the size of the posterior confidency interval.

@junpenglao The reason I didn’t want to use sample_ppc for that, was the use of a microservice for delivering predictions, where PYMC3 would probably be too massive. Moreover the fact that it is trained many times (there are in total 100k+ models, or traces).

Thought about going frequencist for the range estimation, by computing the standard error on a sample out-of-training and assuming a normal distribution of the residuals.

frequencist model + bootstrap should work quite well as well in that case.