Interpretation of posterior predictive checks for a Gaussian Process

Hi All,

I am new to PyMC and would like to thank all people contributing to PyMC. I enjoy exploring PyMC and will likely switch to PyMC from R-based Bayesian tools.

I have a specific question about the interpretation of the posterior predictive checks (PPC).

First thing first, my model is a simple GP model:

with pm.Model() as model:

if LIKELIHOOD == 'lognormal':
    Lambda = pm.LogNormal("Lambda", mu_logx, sigma_logx, shape=m)

elif LIKELIHOOD == 'truncated_normal':
    normal_dist = pm.Normal.dist(mu=mu_x, sigma=sigma_x)
    Lambda = pm.Truncated("Lambda", normal_dist, shape=m, lower=0, upper=10)
#===============================================================================
# specify the mean function:
#===============================================================================
mu = LinearMean(K = K, Lambda = Lambda)

eta_sq = pm.Exponential("eta_sq", 1.0)
rho_sq = pm.Normal("rho_sq", 3.0, 0.25)

#===============================================================================
# specify the covariance function:
#===============================================================================

cov = eta_sq * pm.gp.cov.Exponential(input_dim=len (Y), ls=rho_sq)

#===============================================================================
# specify the GP:
#===============================================================================
gp = pm.gp.Marginal(mean_func=mu, cov_func=cov)

if SIGMA_MODEL == 'halfcauchy':
    sigma = pm.HalfCauchy ("sigma", 1)
else:
    sigma = pm.Exponential("sigma", 2.0)

Y_ = gp.marginal_likelihood("Y_", X=Dmat, y=Y, sigma = sigma)

trace = pm.sample(2000, tune=2000, target_accept=0.9, random_seed=RANDOM_SEED,\
                  return_inferencedata = True,
                  idata_kwargs = {'log_likelihood': True})

Then I did the posterior predictive sampling using:

with model:
pp = pm.sample_posterior_predictive(trace, var_names = [ā€˜Y_ā€™],
extend_inferencedata=True, random_seed=RANDOM_SEED)

The result looks like:

The issue with me is that all observations should be positive as can be seen from ā€œobserved.ā€ My guess is that when new data points were provided in ā€œpm.sample_posterior_predictiveā€, the sampling space of some features were automatically associated with ā€œnegative values.ā€ But, physically, all of my features should be equal to or greater than 0, leading to positive observations.

Then I wrote my own PPC code (for readability I captured the code section from VSC):

Although I am not completely sure if this is correct, here, I randomly changed the original feature ā€œKā€ (169 x 304 matrix, all positive values) to create a new X (features) using:

for j in range (K_copy.shape[0]):
new_X [j, :]= K_copy [j, :] * np.random.uniform (0.8, 1.2, size=K_copy.shape[1])

My own PPC looks like:

This looks more reasonable.

So, my question is which PPC result (az.plot_ppc vs. my own) is correct/better, or are both correct, or something else?

Thank you for your help in advance.

I havenā€™t looked too closely at your PPC code, but it looks like the the root of your issue is that gp.Marginal assumes a Gaussian likelihood, but youā€™d like to try LogNormal or TruncatedNormal because your observed data is positive valued. What you can do is use gp.Latent instead, which doesnā€™t assume a likelihood. This example might be helpful here. How many covariates is your GP over? If small, say 2 or less, you might try HSGP as a replacement for gp.Latent to speed things up.

1 Like

Hi Bill,

Thank you so much for your help!

Definitely, I will try to look at gp.Latent (I think Iā€™ve seen it).

As a separate check, Iā€™ve tried the following:

Y_ = pm.MvNormal (ā€œY_ā€, mu=mu, cov=cov, observed=Y)

which is essentially the same as gp.Marginal - I got a very similar result.

The number of covariates is 2 times (i.e., ~300) the measurement. This means my state (latent) vector is also the size of ~300. My understanding is that in the Bayesian setting, this is not unusual. When I checked the convergence for each latent variable, all looked good, but do you think this could be another problem?

Again, thanks so much!

  • Seongeun

essentially the same as gp.Marginal - I got a very similar result

Yup thatā€™s right, pm.MvNormal is pretty much all gp.Marginal does when fitting.

I see, you have 300 data points so thatā€™s the size of the GP, not unusual at all. If convergence is good then youā€™re most likely good to go

1 Like

Hi Bill,

Thanks for your previous valuable help!

I tried to implement gp.Latent() based on the example you pointed to. My revised code looks like this (I added MODEL == ā€˜GP_Latentā€™):

MODEL == 'GP_Latent'
LIKELIHOOD = 'truncated_normal'
with pm.Model() as model3:
    
    if LIKELIHOOD == 'lognormal':
        Lambda = pm.LogNormal("Lambda", mu_logx, sigma_logx, shape=m)
    elif LIKELIHOOD == 'truncated_normal':
        normal_dist = pm.Normal.dist(mu=mu_x, sigma=sigma_x)
        Lambda = pm.Truncated("Lambda", normal_dist, shape=m, lower=0, upper=10)
   #===============================================================================
    # specify the mean function:
    #===============================================================================
    if MODEL == 'MvNormal':
        print ('>>>>> Using my own Cov model ...')
        mu = pm.math.dot(np.array(K, dtype=np.float32), Lambda)
    else:
        print ('>>>>> Using GP model ...')
        if MODEL == 'GP':
            mu = LinearMean(K = K, Lambda = Lambda)
    
    eta_sq = pm.Exponential("eta_sq", 2)
    rho_sq = pm.Exponential("rho_sq", 0.5)

    if SIGMA_MODEL == 'halfcauchy':
        sigma = pm.HalfCauchy ("sigma", 1)
    else:
        sigma = pm.Exponential("sigma", 2.0)

    #===============================================================================
    # specify the covariance function:
    #===============================================================================
    if MODEL == 'MvNormal':
        cov = cov_fun_temporal (eta_sq, rho_sq, sigma)
    else:
        cov = eta_sq * pm.gp.cov.Exponential(input_dim=len (Y), ls=rho_sq)

    #===============================================================================
    # specify the GP:
    #===============================================================================
    if MODEL == "GP":
        gp = pm.gp.Marginal(mean_func=mu, cov_func=cov)
    elif MODEL == 'GP_Latent':
        gp = pm.gp.Latent(cov_func=cov)
     
        # @Dmat: temporal distance matrix 
        f = gp.prior("f", X=Dmat)

    #===============================================================================
    # Likelihood
    #===============================================================================
    if MODEL == 'MvNormal':
        print ('>>>>> Using MvNormal specification...')
        Y_ = pm.MvNormal ("Y_", mu=mu, cov=cov, observed=Y)
    elif MODEL == 'GP':    
        print ('>>>>> Using GP marginal model specification...')
        Y_ = gp.marginal_likelihood("Y_", X=Dmat, y=Y, sigma = sigma)
    elif MODEL == 'GP_Latent':
        mu = pm.math.dot(np.array(K, dtype=np.float32), Lambda)
        Y_ = pm.Normal("Y_", mu=mu, sigma = sigma, observed=Y)
       
    if MODEL == 'GP_Latent':
        trace = pm.sample(NUM_SAMPLE, tune=TUNE, target_accept=0.9, random_seed=RANDOM_SEED,\
                        idata_kwargs = {'log_likelihood': True})
    else:
        trace = pm.sample_prior_predictive()

        trace.extend(pm.sample(NUM_SAMPLE, tune=TUNE, target_accept=0.9, random_seed=RANDOM_SEED,\
                        idata_kwargs = {'log_likelihood': True}))

In the code, Dmat is the temporal distance matrix (in hours) because my observation is a time series dataset. When I use ā€œX=Dmatā€, the f prior appears to take ā€œ0ā€ as the mean.

Just for clarification, Dmat is used in my own MyNormal (i.e., ā€œY_ = pm.MvNormal (ā€œY_ā€, mu=mu, cov=cov, observed=Y)ā€) as follows:

def cov_fun_temporal (eta_sq, rho_sq, sigma):
    N = len (Y)

    sigma_sq_diag_mat = np.diag (np.ones (len(Y)))
    SIGMA  = sigma_sq_diag_mat *  sigma**2
    #print (SIGMA.eval())
    T = eta_sq * np.exp(-rho_sq*Dmat) + SIGMA
    return (T)

As before, the PPC result does not capture the observation entirely:

Now, I wonder 1) the gp.Latent () function was implemented correctly, and/or 2) the gp model shown here is even an adequate model for this time series data.

Again, thank you for your help!

  • Seongeun

Hi Bill and others,

I was on a different project so I didnā€™t follow this for a while, but I wonder if you could offer any further help to resolve this issue with the following PPC plot (the code for this plot is shown below):

My updated code is shown below. I think that this model with MODEL = ā€˜GP_Latent_TSā€™ is the correct use of the GP model in PyMC.

Thank you.

LIKELIHOOD = 'truncated_normal'
MODEL = 'GP_Latent_TS'

with pm.Model() as model3:
    
    if LIKELIHOOD == 'lognormal':
        Lambda = pm.LogNormal("Lambda", mu_logx, sigma_logx, shape=m)
        # Truncated not working ...
        #Lambda = pm.Truncated("Lambda", pm.LogNormal.dist(mu = mu_logx, sigma = sigma_logx), shape=m, upper=10)
    elif LIKELIHOOD == 'truncated_normal':
        normal_dist = pm.Normal.dist(mu=mu_x, sigma=sigma_x)
        Lambda = pm.Truncated("Lambda", normal_dist, shape=m, lower=0, upper=10)
        #Lambda = pm.Normal("Lambda", 1, 0.5, shape=m)
    
    #===============================================================================
    # Mean function
    #===============================================================================
    if MODEL == 'MvNormal':
        print ('>>>>> Using my own Cov model ...')
        mu = pm.math.dot(np.array(K, dtype=np.float32), Lambda)
    else:
        print ('>>>>> Using GP model ...')
        if MODEL in ['GP', 'GP_Latent_TS']:
            mu = LinearMean(K = K, Lambda = Lambda)
    
    # Stat Rethinking, P. 470
    eta_sq = pm.Exponential("eta_sq", 2)
    #rho_sq = pm.Normal("rho_sq", 3.0, 0.25)
    rho_sq = pm.Exponential("rho_sq", 0.5)

    if SIGMA_MODEL == 'halfcauchy':
        sigma = pm.HalfCauchy ("sigma", 1)
    else:
        sigma = pm.Exponential("sigma", 2.0)

    #===============================================================================
    # specify the covariance function:
    #===============================================================================
    if MODEL == 'MvNormal':
        cov = cov_fun_temporal (eta_sq, rho_sq, sigma)
    else:
        if MODEL == 'GP':
            cov = eta_sq * pm.gp.cov.Exponential(input_dim=len (Y), ls=rho_sq)
        else:
            cov = eta_sq * pm.gp.cov.Exponential(input_dim=1, ls=rho_sq)
            #cov = eta_sq * pm.gp.cov.ExpQuad(input_dim=1, ls=rho_sq)
            # For stability
            cov += pm.gp.cov.WhiteNoise(1e-6)
            
    #===============================================================================
    # specify the GP:
    #===============================================================================
    if MODEL == "GP":
        gp = pm.gp.Marginal(mean_func=mu, cov_func=cov)
    elif MODEL == 'GP_Latent':
        gp = pm.gp.Latent(cov_func=cov)
        # @Dmat: temporal distance matrix 
        f = gp.prior("f", X=Dmat)
    elif MODEL == "GP_Latent_TS":
        gp = pm.gp.Latent(mean_func=mu, cov_func=cov)
        f = gp.prior("f", X=X)

    #===============================================================================
    # Likelihood
    #===============================================================================
    if MODEL == 'MvNormal':
        print ('>>>>> Using MvNormal specifiction ...')
        Y_ = pm.MvNormal ("Y_", mu=mu, cov=cov, observed=Y)
    elif MODEL == 'GP':    
        print ('>>>>> Using GP marginal model specifiction ...')
        Y_ = gp.marginal_likelihood("Y_", X=Dmat, y=Y, sigma = sigma)
    elif MODEL == 'GP_Latent':
        mu = pm.math.dot(np.array(K, dtype=np.float32), Lambda)
        Y_ = pm.Normal("Y_", mu=mu, sigma = sigma, observed=Y)
    elif MODEL == 'GP_Latent_TS':
        if True:
            Y_ = pm.Normal("Y_", mu=f, sigma = sigma, observed=Y)
        else:
            SIGMA = np.diag (np.ones (len(Y))) * sigma**2
            Y_ = pm.MvNormal("Y_", mu=f, cov = SIGMA, observed=Y)
       
    if MODEL in ['GP_Latent', 'GP_Latent_TS']:
        trace = pm.sample(NUM_SAMPLE, tune=TUNE, target_accept=0.9, random_seed=RANDOM_SEED,\
                        idata_kwargs = {'log_likelihood': True})
    else:
        trace = pm.sample_prior_predictive()

        trace.extend(pm.sample(NUM_SAMPLE, tune=TUNE, target_accept=0.9, random_seed=RANDOM_SEED,\
                        idata_kwargs = {'log_likelihood': True}))
        

Is the problem is that the model isnā€™t respecting the positivity constraint of the data? GPs are built from normal distributions, so by default it has no way to respect that constraint. You could:

  1. Use a likelihood function thatā€™s strictly positive, such as pm.LogNormal, and use the (exponential of the) GP as a prior to that likelihood.
  2. Do some kind of transformation to your input data to put it on \mathbb R, model that transformed data, then apply the inverse transform to your modelā€™s outputs. For example, if youā€™re working with prices, Iā€™d model log returns instead of modeling prices directly.

Iā€™d probably lean towards 2, but it might be by frequentist training speaking.

1 Like

Jesse,

I appreciate your help!

Your suggestions all make sense.

Initially, I thought the negativity in predictions was the problem. Also, because of the negative mass there, it seems the overall density shape is not consistent with the observation.

I am using either lognormal or truncated normal to sample positive values for ā€œLambdaā€, which is my key latent variable. Theoretically, as long as ā€œLambdaā€ is positive, the predicted value (equivalent to measurement Y) should also be positive. This means if you take any values from the sampled Lambda and multiply them by ā€œKā€ (all positive values), you always get positive values (i.e., Y_mean = K*Lambda_mean).

Because I am trying to infer ā€œLambdaā€ values rather than predictions of Y values, maybe PPC may not matter much as long as there is reasonable consistency between predicted and observed Y. As a side, I am not sure how exactly the ā€œpm.sample_posterior_predictiveā€ function works; Iā€™ve looked at the source code, but it wasnā€™t easy to follow.

I will look into this more.

Again, thanks for the help!

  • Seongeun

Lambda is the mean of the process (sort of; after marginalization the covariance terms will get mixed into the mean as well). As you can see in your PPC, the output is centered above zero, so thatā€™s something.

Posterior predictive sampling samples from a requested random variable by drawing all required ā€œdownstreamā€ variables from their respective posterior distributions. Consider this model:

\begin{align}y &\sim N(\mu, \sigma) \\ \mu &\sim N(0, 1) \\ \sigma &\sim Exponential(1)\end{align}

In my way of speaking, the random variable y is ā€œdownsteamā€ of \mu and \sigma, because I canā€™t sample from it until I first sample from \mu and \sigma. Graphically this model looks like this:

with pm.Model() as m:
    mu = pm.Normal('mu')
    sigma = pm.Exponential('sigma', 1)
    y = pm.Normal('y', mu=mu, sigma=sigma)
pm.model_to_graphviz(m)

image

In this DAG, you can again see that y depends on \mu and \sigma. So if you do pm.sample_posterior_predictive(idata, var_names=['y']), what will happen is:

  1. A posterior sample for \mu will be randomly selected
  2. A posterior sample for \sigma will be randomly selected
  3. The sample of \mu and \sigma will be used to parameterize the given definition of y, which is y\sim N(\mu, \sigma)
  4. A sample y will be drawn from this new distribution

What happens if you instead ask for pm.sample_posterior_predictive(idata, var_names=['mu'])? \mu depends on nothing ā€“ there are no edges pointing into \mu on the graph. So we can skip steps (1) and (2) and go straight to (3). \mu was defined as \mu \sim N(0, 1), so it will draw from that. Thatā€™s just the prior though, which is probably not what you were expecting! If you just want to look at the posterior, though, thatā€™s already saved in your idata, so thereā€™s no need to do any further sampling.

It looks like you Lambda variable falls into this 2nd category. So you donā€™t actually want to do posterior predictive sampling on it, you can just directly use the posterior. Iā€™d show an example with your code but itā€™s not copy-pastable.

2 Likes

Dear Jesse,

Please accept my apologies for the late reply!

Due to a combination of responsibilities across different projects, coupled with my vacation, it took me a while to return to pymc.

I must say, your response is one of the best answers I have received from any online community. Your example was particularly helpful in aiding my understanding of the process.

Based on your advice, I intend to use the posterior values for Lambda simply.

Regarding your comment ā€œIā€™d show an example with your code but itā€™s not copy-pastable,ā€ just out of curiosity, if possible, Iā€™d be interested in seeing an example from my code. Iā€™m including (copy and paste from my previous post without styling) it below for your reference (if you are referring to a different code of mine, please let me know).

Again, thank you so much for your fantastic help!

ā€“ Seongeun

LIKELIHOOD = ā€˜truncated_normalā€™
MODEL = ā€˜GP_Latent_TSā€™

with pm.Model() as model3:

if LIKELIHOOD == 'lognormal':
    Lambda = pm.LogNormal("Lambda", mu_logx, sigma_logx, shape=m)
    # Truncated not working ...
    #Lambda = pm.Truncated("Lambda", pm.LogNormal.dist(mu = mu_logx, sigma = sigma_logx), shape=m, upper=10)
elif LIKELIHOOD == 'truncated_normal':
    normal_dist = pm.Normal.dist(mu=mu_x, sigma=sigma_x)
    Lambda = pm.Truncated("Lambda", normal_dist, shape=m, lower=0, upper=10)
    #Lambda = pm.Normal("Lambda", 1, 0.5, shape=m)

#===============================================================================
# Mean function
#===============================================================================
if MODEL == 'MvNormal':
    print ('>>>>> Using my own Cov model ...')
    mu = pm.math.dot(np.array(K, dtype=np.float32), Lambda)
else:
    print ('>>>>> Using GP model ...')
    if MODEL in ['GP', 'GP_Latent_TS']:
        mu = LinearMean(K = K, Lambda = Lambda)

# Stat Rethinking, P. 470
eta_sq = pm.Exponential("eta_sq", 2)
#rho_sq = pm.Normal("rho_sq", 3.0, 0.25)
rho_sq = pm.Exponential("rho_sq", 0.5)

if SIGMA_MODEL == 'halfcauchy':
    sigma = pm.HalfCauchy ("sigma", 1)
else:
    sigma = pm.Exponential("sigma", 2.0)

#===============================================================================
# specify the covariance function:
#===============================================================================
if MODEL == 'MvNormal':
    cov = cov_fun_temporal (eta_sq, rho_sq, sigma)
else:
    if MODEL == 'GP':
        cov = eta_sq * pm.gp.cov.Exponential(input_dim=len (Y), ls=rho_sq)
    else:
        cov = eta_sq * pm.gp.cov.Exponential(input_dim=1, ls=rho_sq)
        #cov = eta_sq * pm.gp.cov.ExpQuad(input_dim=1, ls=rho_sq)
        # For stability
        cov += pm.gp.cov.WhiteNoise(1e-6)
        
#===============================================================================
# specify the GP:
#===============================================================================
if MODEL == "GP":
    gp = pm.gp.Marginal(mean_func=mu, cov_func=cov)
elif MODEL == 'GP_Latent':
    gp = pm.gp.Latent(cov_func=cov)
    # @Dmat: temporal distance matrix 
    f = gp.prior("f", X=Dmat)
elif MODEL == "GP_Latent_TS":
    gp = pm.gp.Latent(mean_func=mu, cov_func=cov)
    f = gp.prior("f", X=X)

#===============================================================================
# Likelihood
#===============================================================================
if MODEL == 'MvNormal':
    print ('>>>>> Using MvNormal specifiction ...')
    Y_ = pm.MvNormal ("Y_", mu=mu, cov=cov, observed=Y)
elif MODEL == 'GP':    
    print ('>>>>> Using GP marginal model specifiction ...')
    Y_ = gp.marginal_likelihood("Y_", X=Dmat, y=Y, sigma = sigma)
elif MODEL == 'GP_Latent':
    mu = pm.math.dot(np.array(K, dtype=np.float32), Lambda)
    Y_ = pm.Normal("Y_", mu=mu, sigma = sigma, observed=Y)
elif MODEL == 'GP_Latent_TS':
    if True:
        Y_ = pm.Normal("Y_", mu=f, sigma = sigma, observed=Y)
    else:
        SIGMA = np.diag (np.ones (len(Y))) * sigma**2
        Y_ = pm.MvNormal("Y_", mu=f, cov = SIGMA, observed=Y)
   
if MODEL in ['GP_Latent', 'GP_Latent_TS']:
    trace = pm.sample(NUM_SAMPLE, tune=TUNE, target_accept=0.9, random_seed=RANDOM_SEED,\
                    idata_kwargs = {'log_likelihood': True})
else:
    trace = pm.sample_prior_predictive()

    trace.extend(pm.sample(NUM_SAMPLE, tune=TUNE, target_accept=0.9, random_seed=RANDOM_SEED,\
                    idata_kwargs = {'log_likelihood': True}))

Hey Seongeun,

Iā€™m glad it was helpful. The code you provided still isnā€™t ā€œcopy-pastableā€, by that I mean I can directly paste the code into a jupyter cell and hit run. You will need add imports and a function to generate data.

Thanks for clarifying your comment, Jesse!

Iā€™ve cleaned up my code, which I started using to learn PyMC, and am attaching it here with the input data. I tested it on my Mac and Colab, so hopefully, it should work without issues.

A few notes:

  • I use the Latent model as a default (i.e., MODEL = 'GP_Latent_TS') based on the comments I received here.
  • I just used a few hundred samples for testing purposes in this slow model ā€“ this needs to be adjusted.
  • I have two options for priors for my state vector ā€œLambdaā€: truncated normal and lognormal. I find that the lognormal prior yields an unrealistic posterior value, likely due to the long-tailed distribution. But I donā€™t know how to set the bounds for lognormal. If you have any ideas, please let me know.
  • I added some plots; I guess they will show what I am trying to see as the final result, which is the median Lambda values and their uncertainty.
  • Given the inherent error in my physical model in producing the K matrix, GP may not capture all of the observations.
  • If you see any mistakes or have any comments or suggestions, your help would be appreciated.

Iā€™m still not sure about the posterior predictive sampling (i.e., sample_posterior_predictive) for my positive observation Y, although I may not need it. Then what would be used for model checking?

I couldnā€™t attach my Jupyter Notebook file (not allowed?), so I converted it to a Python script. On Visual Studio Code, I could run the whole script after pasting it into a single Jupyter Notebook cell. I guess you know it better than I do ā€¦

Again, thank you so much for your help!

ā€“ Seongeun

GP.py (10.4 KB)

Dmat.csv (589.5 KB)
Y.csv (3.1 KB)
K.csv (908.5 KB)