Poor estimates from pymc examples on marginal and latent GP

giorgp · May 25, 2023, 2:42pm

Hi all, in a different thread I summarised the issues I had when trying to translate my stan code to pymc. To investigate what’s going wrong, I decided to take a step back and try to reproduce the pymc examples of marginal and latent GP formulations. However, my posteriors are quite poor, as you can see below:

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pymc as pm

# set the seed
np.random.seed(1)

n = 100  # The number of data points
X = np.linspace(0, 10, n)[:, None]  # The inputs to the GP, they must be arranged as a column vector

# Define the true covariance function and its parameters
ℓ_true = 1.0
η_true = 3.0
cov_func = η_true**2 * pm.gp.cov.Matern52(1, ℓ_true)

# A mean function that is zero everywhere
mean_func = pm.gp.mean.Zero()

# The latent function values are one sample from a multivariate normal
# Note that we have to call `eval()` because PyMC3 built on top of Theano
f_true = np.random.multivariate_normal(
    mean_func(X).eval(), cov_func(X).eval() + 1e-8 * np.eye(n), 1
).flatten()

# The observed data is the latent function plus a small amount of IID Gaussian noise
# The standard deviation of the noise is `sigma`
σ_true = 2.0
y = f_true + σ_true * np.random.randn(n)

## Plot the data and the unobserved latent function
fig = plt.figure(figsize=(12, 5))
ax = fig.gca()
ax.plot(X, f_true, "dodgerblue", lw=3, label="True f")
ax.plot(X, y, "ok", ms=3, alpha=0.5, label="Data")
ax.set_xlabel("X")
ax.set_ylabel("The true f(x)")
plt.legend();

with pm.Model() as model_marginal:
    ℓ = pm.Gamma("ℓ", alpha=2, beta=1)
    η = pm.HalfCauchy("η", beta=5)

    cov = η**2 * pm.gp.cov.Matern52(1, ℓ)
    gp = pm.gp.Marginal(cov_func=cov)

    σ = pm.HalfCauchy("σ", beta=5)
    y_ = gp.marginal_likelihood("y", X=X, y=y, sigma=σ)

    idata_marginal = pm.sample(1000, tune=1000, chains=2, cores=1)

az.summary(idata_marginal, var_names = ["ℓ", "η", "σ"], round_to=2)

the output is:

	mean	sd	hdi_3%	hdi_97%	mcse_mean	mcse_sd	ess_bulk	ess_tail	r_hat
ℓ	1.20	0.32	0.66	1.78	0.01	0.01	1100.90	802.66	1.0
η	4.51	1.63	2.30	7.31	0.05	0.04	1144.79	1067.48	1.0
σ	1.93	0.15	1.66	2.21	0.00	0.00	1732.24	1234.32	1.0

As you can see, the mean estimate of η (4.51) is roughly 1sd away from the true value (3.00). I was expecting the estimate to be quite a lot better. The results from the latent formulation are comparably bad. Interestingly, the MAP estimates (not included above) are better than the MCMC sample estimates.

Am I doing something wrong, or is this the level of agreement you would expect?

bwengals · May 26, 2023, 6:58pm

Nothing jumps out as being concerning here. eta is pretty well within the 94% HDI.

To get some intuition on it, eta represents the amplitude variance, or how far “up and down” the process will go. In a time series context, if the process doesn’t run for a very long time relative to the lengthscale, ie, there aren’t many replications of the function’s ups and downs, then it’s not really possible constrain eta. If the lengthscale is short, say 2, and the GP runs for a while, say 500, then you’ll get much better estimates of eta because there are many examples of the GP reaching its maximums and minimums to estimate it’s overall variance.

In the two examples you tried, I’d say it’s on the short side. You can try extending the domain for a longer data set and you should see eta getting a narrower posterior.

giorgp · May 26, 2023, 7:19pm

Many thanks for your insightful response, your explanation makes sense. As mentioned above, I went through these examples in order to investigate some modelling I had done in Stan.

For reference, earlier today I reproduced the marginal GP example discussed in this post in Stan and got similar results as in pymc.

Topic		Replies	Views
Latent gaussian process prediction problem Questions gaussian_process	3	1085	January 25, 2022
How do you formulate a `Latent` GP that is equivalent to the `Marginal` implementation? Questions	5	489	May 26, 2021
MarginalSparse with low sigma as a workaround for sparse Latent GP Questions	6	773	April 25, 2018
GP and Bayesian linear regression in PyMC Questions	8	1420	August 18, 2018
Marginal likelihood implementation and measurement uncertainty Questions	12	851	November 2, 2019

Poor estimates from pymc examples on marginal and latent GP

Related topics