Pointwise Log Likelihood For Gaussian Processes

I am trying to compare different GP models using az.loo or az.waic. In PyMC these methods work by computing the pointwise log-likelihood with pm.compute_log_likelihood function.
The problem i am facing is that with GP models the compute_log_likelihood does not return the values for each element.
I was expecting the loglikelihood group to have shape (chain, draw, y_dim_0) instead it has (chain,draw).
Is there something i am missing?
I will leave a reproducible example from the Marginal Likelihood Implementation notebook.

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import scipy as sp


# set the seed
np.random.seed(1)

n = 100  # The number of data points
X = np.linspace(0, 10, n)[:, None]  # The inputs to the GP, they must be arranged as a column vector

# Define the true covariance function and its parameters
ell_true = 1.0
eta_true = 3.0
cov_func = eta_true**2 * pm.gp.cov.Matern52(1, ell_true)

# A mean function that is zero everywhere
mean_func = pm.gp.mean.Zero()

# The latent function values are one sample from a multivariate normal
# Note that we have to call `eval()` because PyMC3 built on top of Theano
f_true = np.random.multivariate_normal(
    mean_func(X).eval(), cov_func(X).eval() + 1e-8 * np.eye(n), 1
).flatten()

# The observed data is the latent function plus a small amount of IID Gaussian noise
# The standard deviation of the noise is `sigma`
sigma_true = 2.0
y = f_true + sigma_true * np.random.randn(n)


with pm.Model() as model:
    ell = pm.Gamma("ell", alpha=2, beta=1)
    eta = pm.HalfCauchy("eta", beta=5)

    cov = eta**2 * pm.gp.cov.Matern52(1, ell)
    gp = pm.gp.Marginal(cov_func=cov)

    sigma = pm.HalfCauchy("sigma", beta=5)
    y_ = gp.marginal_likelihood("y", X=X, y=y, sigma=sigma)

with model:
    marginal_post = pm.sample(nuts_sampler="pymc", idata_kwargs={"log_likelihood": True})

az.loo(marginal_post)

The loo computation of course provide the following Userwarning:
‘The point-wise LOO is the same with the sum LOO, please double check the Observed RV in your model to make sure it returns element-wise logp.’

CC @bwengals

This would be expected because pm.gp.Marginal.marginal_likelihood is just a pm.MvNormal under the hood. It’s evaluating your y data as if it’s one single data point – one draw from an multivariate normal.

You’ll need to either calculate the log-likelihoods manually, or resample your model using pm.gp.Latent with separate Normal likelihood. what @jessegrabowski said below

You’ll find that the sum of the log-likelihoods calculated this way will equal the log-likelihood from pm.gp.Marginal.

You shouldn’t need to resample the whole model. Instead, write the pm.gp.Latent + pm.Normal version of the model, then inside that context call pm.compute_log_likelihood, passing in the idata from the marginalized version.

1 Like

First of all, thanks for the quick replies.
I tested some of your suggestions.

resample your model using pm.gp.Latent with separate Normal likelihood. The sum of the log-likelihoods calculated this way will equal the log-likelihood from pm.gp.Marginal.

Using the model from my first message i have written its Latent version (which calculate the correct elementwise log likelihood) but the sum of the likelihoods do not match. I have -3.37e+05 for the Latent model and -4.39e+05 for the Marginal model.I have set the random seed as suggested here.
The implementation of the Latent model is the following:

rng = np.random.default_rng(42)

with pm.Model() as model:
    ell = pm.Gamma("ell", alpha=2, beta=1)
    eta = pm.HalfCauchy("eta", beta=5)

    cov = eta**2 * pm.gp.cov.Matern52(1, ell)
    gp = pm.gp.Latent(cov_func=cov)

    f = gp.prior("f", X=X)
    
    sigma = pm.HalfCauchy("sigma", beta=5)
    y_ = pm.Normal('y', f, sigma, observed=y)

    latent_post = pm.sample(nuts_sampler="pymc", idata_kwargs={"log_likelihood": True}, random_seed=rng)

# i calculated the sum in this way
print(latent_post.log_likelihood.sum()) 
# compare it to the marginal model
print(marginal_post.log_likelihood.sum())

inside that context call pm.compute_log_likelihood, passing in the idata from the marginalized version.

As suggested i did the following:

with pm.Model() as model:
    ell = pm.Gamma("ell", alpha=2, beta=1)
    eta = pm.HalfCauchy("eta", beta=5)

    cov = eta**2 * pm.gp.cov.Matern52(1, ell)
    gp = pm.gp.Latent(cov_func=cov)

    f = gp.prior("f", X=X)
    
    sigma = pm.HalfCauchy("sigma", beta=5)
    y_ = pm.Normal('y', f, sigma, observed=y)

    latent_post = pm.compute_log_likelihood(marginal_post)

and i have the following error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/opt/conda/envs/dev/lib/python3.12/site-packages/xarray/core/dataset.py in ?(self, names)
   1474                 variables[name] = self._variables[name]
   1475             except KeyError:
-> 1476                 ref_name, var_name, var = _get_virtual_variable(
   1477                     self._variables, name, self.sizes

KeyError: 'f_rotated_'

The only method that worked so far is to resample the model using the Latent implementation but it is very slow and not feasible in my application.

Do you have any ideas on how to solve this? My last approach would be calculate it manually. If you could give me an input also on that it would be very helpful.

I need also to ask you another thing related to this topic. I am using pm.compute_log_likelihood on a sparse gp. My understanding is that the sparse gp implementation is done using a pm.Potential which the pm.compute_log_likelihood ignores. Is that correct?
Then how can i compute it? My guess is that the best way is to compute it manually because writing the Full Latent model and resampling would be too slow.
In theory, at least from my understanding, this pointwise log likelihood (from the sparse model) would be calculated from the approximation of the likelihood given by FITC or VFE.

Thanks again

So I’d be careful here, because VFE is a variational approximation, so the likelihood isn’t calculated as part of the model, but a lower bound on it. FITC is also viewed as a likelihood approximation, see this paper.

If I were you, I’d check Section 5.4.2 in GPML for the analytic calculation.

For your point 2, that error might be happening because you have reparameterize = True (the default) in the call f = gp.prior("f", X=X). Try reparameterize=False.