Posterior predicton

Dear All,

Thanks for the previous assistance. After experimenting with PyMC for a few months, I’ve decided to use it in my transition from R.

I’ve posted on a related topic before, but this time, my question is slightly different. (I’ve included the model (with a slight change) from my previous post below for reference.)

Jumping to the question directly, I have a posterior prediction plot:

image

which was generated from:


with model:
    x_conditional = gp.conditional('x_conditional', X_new)
    #print (x_train_conditional.eval())
    y_pred_samples = pm.sample_posterior_predictive(trace, var_names=['x_conditional'])

y_pred_samples_mean = y_pred_samples['posterior_predictive'].mean (['chain', 'draw']).to_array().values.ravel()
y_pred_samples_std = y_pred_samples['posterior_predictive'].std(['chain', 'draw']).to_array().values.ravel()

y_pred_samples_mean_plus = y_pred_samples_mean + 2*y_pred_samples_std
y_pred_samples_mean_minus = y_pred_samples_mean - 2*y_pred_samples_std

fig, ax = plt.subplots()
sns.lineplot(x=range (len(Y)), y=Y, color=sns_c[0], label='Observation', ax=ax)

sns.lineplot(x=X_new.flatten(), y=y_pred_samples_mean, color=sns_c[2], label='Prediction', ax=ax)

ax.fill_between(
        x=range (len(Y)),
        y1=y_pred_samples_mean_minus,
        y2=y_pred_samples_mean_plus,
        color=sns_c[2],
        alpha=0.8,
        label='credible_interval (train)'
    )

But when I used:

with model:
    pp = pm.sample_posterior_predictive(trace, var_names = ['Y_'],\
                    extend_inferencedata=True, random_seed=RANDOM_SEED)
ppc_mean = ppc['Y_'].mean (['chain', 'draw'])
ppc_std = ppc['Y_'].std(['chain', 'draw'])

ppc_mean_plus = ppc_mean + 2*ppc_std
ppc_mean_minus = ppc_mean - 2*ppc_std

fig, ax = plt.subplots()
sns.lineplot(x=X.flatten(), y=Y, color=sns_c[0], label='y_train', ax=ax)

sns.lineplot(x=X.flatten(), y=ppc_mean, color=sns_c[2], label='y_pred_train', ax=ax)

ax.fill_between(
        x=X.flatten(),
        y1=ppc_mean_minus,
        y2=ppc_mean_plus,
        color=sns_c[2],
        alpha=0.8,
        label='credible_interval (train)'
    )

I get:

image

When I checked with the equations (Eq. 2.24) from " Gaussian Processes for Machine Learning", I think the latter (i.e., the one with a wider uncertainty range) seems to be the correct one. This GPML equation assumes zero mean but was useful to just check the uncertainty range. I don’t know how to do with non-zero mean case, but I think pm.sample_posterior_predictive should take care of it.

I wonder if someone can confirm this and/or explain the difference. I guess the answer can be a simple one, but I just want to double check it.

Thank you.


MODEL:

with pm.Model() as model:

    if LAMBDA_PRIOR == 'lognormal':
        Lambda = pm.LogNormal("Lambda", mu_log_lambda, sigma_log_lambda, shape=m)
    elif LAMBDA_PRIOR == 'truncated_normal':
        normal_dist = pm.Normal.dist(mu=mu_lambda, sigma=sigma_lambda)
        Lambda = pm.Truncated("Lambda", normal_dist, shape=m, lower=0, upper=10)
   
    #===============================================================================
    # Mean function
    #===============================================================================
    if MODEL == 'MvNormal':
        print ('>>>>> Using my own Cov model ...')
        mu = pm.math.dot(np.array(K, dtype=np.float32), Lambda)
    else:
        print ('>>>>> Using GP model ...')
        if MODEL in ['GP', 'GP_Latent_TS']:
            mu = LinearMean(K = K, Lambda = Lambda)

    #===============================================================================
    # Kernel variance
    #===============================================================================
    var = pm.Exponential("kernel_var",  2)

    #===============================================================================
    # Kernel length
    #===============================================================================
    # 7 days over 31 days/mo and 3 months
    length = pm.Exponential("kernel_length", 1/(7/(31*3)))

    if KERNEL_NOISE == 'halfcauchy':
        noise = pm.HalfCauchy ("noise", 5, shape = 3)
    elif KERNEL_NOISE == 'exponential':
        noise = pm.Exponential("noise", 2.0, shape = 3)

    #===============================================================================
    # specify the covariance function:
    #===============================================================================
    if MODEL == 'MvNormal':
        cov = cov_fun_temporal (var, length, noise)
    else:
        if MODEL == 'GP':
            cov = var * pm.gp.cov.Exponential(input_dim=len (Y), ls=length)
        else:
            if KERNEL == 'expo':
                cov = var * pm.gp.cov.Exponential(input_dim=1, ls=length)
            elif KERNEL == 'matern52':
                cov = var * pm.gp.cov.Matern52(input_dim=1, ls=length)
    # Add white noise for stability
    cov += pm.gp.cov.WhiteNoise(1e-6)

    #===============================================================================
    # specify the GP:
    #===============================================================================
    if MODEL == "GP":
        gp = pm.gp.Marginal(mean_func=mu, cov_func=cov)
    elif MODEL == "GP_Latent_TS":
        gp = pm.gp.Latent(mean_func=mu, cov_func=cov)
        f = gp.prior("f", X=X)

    #===============================================================================
    # Likelihood
    #===============================================================================
    if MODEL == 'MvNormal':
        print ('>>>>> Using MvNormal specifiction ...')
        Y_ = pm.MvNormal ("Y_", mu=mu, cov=cov, observed=Y)
    elif MODEL == 'GP_Latent_TS':
        Y_ = pm.Normal("Y_", mu=f, sigma = noise [month_id], observed=Y)

    ################################################################################
    # Sampling
    ################################################################################
    if MODEL in ['GP_Latent', 'GP_Latent_TS']:
        if USE_JAX:
            trace = pm.sampling_jax.sample_numpyro_nuts (
                            NUM_SAMPLE, tune=TUNE,\
                            target_accept=0.9, random_seed=RANDOM_SEED,\
                            idata_kwargs = {'log_likelihood': True}, chains=NUM_CHAIN)
        else:
            trace = pm.sample(NUM_SAMPLE, tune=TUNE, target_accept=0.9, random_seed=RANDOM_SEED,\
                        idata_kwargs = {'log_likelihood': True}, chains=4, cores=NUM_CHAIN)
    else:
        trace = pm.sample_prior_predictive()

        trace.extend(pm.sample(NUM_SAMPLE, tune=TUNE, target_accept=0.9, random_seed=RANDOM_SEED,\
                        idata_kwargs = {'log_likelihood': True}))

I think in the first example you are not including the additional noise from the likelihood – it’s just the GP. So the posterior predictive of only f. Then in your second example you have the GP plus the likelihood noise. There’s some if/else in your model code, so not sure which way things went. But if your likelihood is Gaussian, that means your modeling a GP plus some iid normal noise. The additional noise is what you’re seeing in your second example.

It looks like your training data is all positive, so its possible that your likelihood isn’t normal and something else might be more appropriate.

1 Like

Thanks very much, Bill!

Your answer is clear to me. For the result shown, I used the “GP_Latent_TS” method, that is, latent GP.

One interesting observation is that when I use “MvNormal”, the majority of the “Pareto k diagnostic values” (from “az.loo”) are labeled “very bad.” However, when I used “pm.gp.Latent”, most of the points were deemed “good.” This relates to your point about my likelihood not being normal. I believe that’s indeed the case. I appreciate the utility of “pm.gp.Latent.”

Again, thanks very much!