Interpretation of posterior predictive checks for a Gaussian Process

Hi Bill,

Thanks for your previous valuable help!

I tried to implement gp.Latent() based on the example you pointed to. My revised code looks like this (I added MODEL == ‘GP_Latent’):

MODEL == 'GP_Latent'
LIKELIHOOD = 'truncated_normal'
with pm.Model() as model3:
    
    if LIKELIHOOD == 'lognormal':
        Lambda = pm.LogNormal("Lambda", mu_logx, sigma_logx, shape=m)
    elif LIKELIHOOD == 'truncated_normal':
        normal_dist = pm.Normal.dist(mu=mu_x, sigma=sigma_x)
        Lambda = pm.Truncated("Lambda", normal_dist, shape=m, lower=0, upper=10)
   #===============================================================================
    # specify the mean function:
    #===============================================================================
    if MODEL == 'MvNormal':
        print ('>>>>> Using my own Cov model ...')
        mu = pm.math.dot(np.array(K, dtype=np.float32), Lambda)
    else:
        print ('>>>>> Using GP model ...')
        if MODEL == 'GP':
            mu = LinearMean(K = K, Lambda = Lambda)
    
    eta_sq = pm.Exponential("eta_sq", 2)
    rho_sq = pm.Exponential("rho_sq", 0.5)

    if SIGMA_MODEL == 'halfcauchy':
        sigma = pm.HalfCauchy ("sigma", 1)
    else:
        sigma = pm.Exponential("sigma", 2.0)

    #===============================================================================
    # specify the covariance function:
    #===============================================================================
    if MODEL == 'MvNormal':
        cov = cov_fun_temporal (eta_sq, rho_sq, sigma)
    else:
        cov = eta_sq * pm.gp.cov.Exponential(input_dim=len (Y), ls=rho_sq)

    #===============================================================================
    # specify the GP:
    #===============================================================================
    if MODEL == "GP":
        gp = pm.gp.Marginal(mean_func=mu, cov_func=cov)
    elif MODEL == 'GP_Latent':
        gp = pm.gp.Latent(cov_func=cov)
     
        # @Dmat: temporal distance matrix 
        f = gp.prior("f", X=Dmat)

    #===============================================================================
    # Likelihood
    #===============================================================================
    if MODEL == 'MvNormal':
        print ('>>>>> Using MvNormal specification...')
        Y_ = pm.MvNormal ("Y_", mu=mu, cov=cov, observed=Y)
    elif MODEL == 'GP':    
        print ('>>>>> Using GP marginal model specification...')
        Y_ = gp.marginal_likelihood("Y_", X=Dmat, y=Y, sigma = sigma)
    elif MODEL == 'GP_Latent':
        mu = pm.math.dot(np.array(K, dtype=np.float32), Lambda)
        Y_ = pm.Normal("Y_", mu=mu, sigma = sigma, observed=Y)
       
    if MODEL == 'GP_Latent':
        trace = pm.sample(NUM_SAMPLE, tune=TUNE, target_accept=0.9, random_seed=RANDOM_SEED,\
                        idata_kwargs = {'log_likelihood': True})
    else:
        trace = pm.sample_prior_predictive()

        trace.extend(pm.sample(NUM_SAMPLE, tune=TUNE, target_accept=0.9, random_seed=RANDOM_SEED,\
                        idata_kwargs = {'log_likelihood': True}))

In the code, Dmat is the temporal distance matrix (in hours) because my observation is a time series dataset. When I use “X=Dmat”, the f prior appears to take “0” as the mean.

Just for clarification, Dmat is used in my own MyNormal (i.e., “Y_ = pm.MvNormal (“Y_”, mu=mu, cov=cov, observed=Y)”) as follows:

def cov_fun_temporal (eta_sq, rho_sq, sigma):
    N = len (Y)

    sigma_sq_diag_mat = np.diag (np.ones (len(Y)))
    SIGMA  = sigma_sq_diag_mat *  sigma**2
    #print (SIGMA.eval())
    T = eta_sq * np.exp(-rho_sq*Dmat) + SIGMA
    return (T)

As before, the PPC result does not capture the observation entirely:

Now, I wonder 1) the gp.Latent () function was implemented correctly, and/or 2) the gp model shown here is even an adequate model for this time series data.

Again, thank you for your help!

  • Seongeun