Computing Credible Intervals for Predictions

pythonometrist · August 12, 2021, 1:39am

I want to make sure that the prediction intervals are correctly being computed. Could the good people on this platform help me out?

Model:

def wide(X_train, y_train, model_type):
    M = X_train.shape[1]
    with pm.Model() as model:
        β =pm.Normal('β',  mu=0.0, sd=10.0, shape=M)
        β0 = pm.Normal("β0", 0, 10.)
        σ = pm.HalfNormal("σ", 2.5)
        pred = pm.Data("pred", X_train)
        mu = pm.Deterministic('mu',β0 + pm.math.dot(pred, β))
        obs = pm.Normal("obs",mu , σ, observed=y_train)
    return model

Prediction

def predict(model,trace,  X_all):
    #predict
    with model:
        # update values of predictors:
        pm.set_data({"pred": X_all})
        # use the updated values and predict outcomes and probabilities:
        posterior_predictive = pm.sample_posterior_predictive(
            trace, var_names=["mu"], random_seed=42, samples=1000, 
        ) # this should be obs in place of mu
        model_preds = posterior_predictive["mu"]  # this should be obs in place of mu
        return model_preds,posterior_predictive

Get the predicted y

import arviz as az
y_pred =model_preds.mean(axis=0)

bands =az.hdi(model_preds, hdi_prob=.95).tolist() 
upper= bands[1]
lower =bands[0]

or equivalently

lower = az.hdi(model_preds,0.95)[:,0]
upper = az.hdi(model_preds,0.95)[:,1]

Where upper and lower are the cutoffs for the upper and lower bounds for the credible intervals.

For some simple models, this seems to yield larger credible intervals than for frequentist methods. Given my rather large standard deviations, I am surprised. So wanted to check whether this makes sense.

Q1) Is the above implementation correct?
Q2) Should I use mu in place of obs to get to the predicted value. mu captures the predicted value for any particular combination of estimated parameters - adding the additional step of drawing from the likelihood (obs) seems un-necessary. On the other hand we know the variance in the distribution of y from the posterior - seems a shame to throw away this information
Q3) Is there a reason to expect the credible intervals to vary a lot between the training period and the forecast period? In both cases the credible intervals are based on the forecast values - which dependent only on the posterior samples of the coefficients - so I don’t expect the credible intervals in the treatment period to increase drastically in the forecast period (relative to training period). Does that make sense?

Topic		Replies	Views
Computing Prediction Intervals without sampling Questions	4	1028	May 16, 2018
Prediction Interval Questions	3	470	March 13, 2020
CausalPy - Evaluating uncertainty for Interrupted time series version agnostic time_series , modeling , causalpy	9	702	March 2, 2024
Help model structure v5 modeling	5	324	February 22, 2024
What's the correct syntax for displaying confidence intervals around the mean? v5 arviz	2	1489	June 21, 2022

Computing Credible Intervals for Predictions

Related topics