Poor estimates from pymc examples on marginal and latent GP

Hi all, in a different thread I summarised the issues I had when trying to translate my stan code to pymc. To investigate what’s going wrong, I decided to take a step back and try to reproduce the pymc examples of marginal and latent GP formulations. However, my posteriors are quite poor, as you can see below:

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pymc as pm

# set the seed
np.random.seed(1)

n = 100  # The number of data points
X = np.linspace(0, 10, n)[:, None]  # The inputs to the GP, they must be arranged as a column vector

# Define the true covariance function and its parameters
ℓ_true = 1.0
η_true = 3.0
cov_func = η_true**2 * pm.gp.cov.Matern52(1, ℓ_true)

# A mean function that is zero everywhere
mean_func = pm.gp.mean.Zero()

# The latent function values are one sample from a multivariate normal
# Note that we have to call `eval()` because PyMC3 built on top of Theano
f_true = np.random.multivariate_normal(
    mean_func(X).eval(), cov_func(X).eval() + 1e-8 * np.eye(n), 1
).flatten()

# The observed data is the latent function plus a small amount of IID Gaussian noise
# The standard deviation of the noise is `sigma`
σ_true = 2.0
y = f_true + σ_true * np.random.randn(n)

## Plot the data and the unobserved latent function
fig = plt.figure(figsize=(12, 5))
ax = fig.gca()
ax.plot(X, f_true, "dodgerblue", lw=3, label="True f")
ax.plot(X, y, "ok", ms=3, alpha=0.5, label="Data")
ax.set_xlabel("X")
ax.set_ylabel("The true f(x)")
plt.legend();

with pm.Model() as model_marginal:
    ℓ = pm.Gamma("ℓ", alpha=2, beta=1)
    η = pm.HalfCauchy("η", beta=5)

    cov = η**2 * pm.gp.cov.Matern52(1, ℓ)
    gp = pm.gp.Marginal(cov_func=cov)

    σ = pm.HalfCauchy("σ", beta=5)
    y_ = gp.marginal_likelihood("y", X=X, y=y, sigma=σ)

    idata_marginal = pm.sample(1000, tune=1000, chains=2, cores=1)

az.summary(idata_marginal, var_names = ["ℓ", "η", "σ"], round_to=2)

the output is:

mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
1.20 0.32 0.66 1.78 0.01 0.01 1100.90 802.66 1.0
η 4.51 1.63 2.30 7.31 0.05 0.04 1144.79 1067.48 1.0
σ 1.93 0.15 1.66 2.21 0.00 0.00 1732.24 1234.32 1.0

As you can see, the mean estimate of η (4.51) is roughly 1sd away from the true value (3.00). I was expecting the estimate to be quite a lot better. The results from the latent formulation are comparably bad. Interestingly, the MAP estimates (not included above) are better than the MCMC sample estimates.

Am I doing something wrong, or is this the level of agreement you would expect?

Nothing jumps out as being concerning here. eta is pretty well within the 94% HDI.

To get some intuition on it, eta represents the amplitude variance, or how far “up and down” the process will go. In a time series context, if the process doesn’t run for a very long time relative to the lengthscale, ie, there aren’t many replications of the function’s ups and downs, then it’s not really possible constrain eta. If the lengthscale is short, say 2, and the GP runs for a while, say 500, then you’ll get much better estimates of eta because there are many examples of the GP reaching its maximums and minimums to estimate it’s overall variance.

In the two examples you tried, I’d say it’s on the short side. You can try extending the domain for a longer data set and you should see eta getting a narrower posterior.

1 Like

Many thanks for your insightful response, your explanation makes sense. As mentioned above, I went through these examples in order to investigate some modelling I had done in Stan.

For reference, earlier today I reproduced the marginal GP example discussed in this post in Stan and got similar results as in pymc.