Latent Gaussian Process sampling

I am using a Latent Gaussian Process with Negative Binomial likelihood. I am running into an issue where sampling from the posterior gives almost identical f values in all dimensions (around ±35, which is the square root of the mean value of my observed dataset). I understand that each dimension of f should converge over time, but I don’t think that every dimension should be the same. Any ideas why this is happening?

Here is my code and the trace plot it produces:

with pm.Model() as model:
    a = pm.TruncatedNormal('amplitude', mu=1, sigma=10, lower=0)
    l = pm.TruncatedNormal('time-scale', mu=10, sigma=10, lower=0)
    cov_func = a**2 *, ls=l)

    gp =

    f = gp.prior('f', X=t)

    alpha = pm.TruncatedNormal('alpha', mu=500, sigma=500, lower=0)
    y_ = pm.NegativeBinomial('y', mu=tt.square(f)+1e-6, alpha=alpha, observed=y)

    trace = pm.sample(500, chains=1, tune=1000, target_accept=.90)

Hard to tell exactly without seeing your data, but it’s interesting that f is mirrored. What happens if you use tt.exp(f) instead of tt.square(f) as your link function?

I tried with tt.exp(f) and f is now clustered around 7.1, which is the log of the mean of my dataset.

It looks like the model is not attributing much of the variation to temporal correlations in the GP. It might be informative to make plots of the rolling mean (or a smoothed version with bandwidth close to time-scale) of your data across time against the posterior estimates of exp(f). If these two quantities are similar, then there may not be much else you can do without changing some prior assumptions. Also, are you including an intercept term in your GP? It might help us to understand what’s going on if the values of f are clearly partitioned into a mean value and deviations about that mean due to temporal correlations.


I realized I made a stupid mistake… I converted both my x’s and y’s to column vectors using [:,None], but that’s only necessary for the x’s. Thank you anyway for all your help!

1 Like

Curious, why did you choose to use the Latent GP with the Negative Binomial Likelihood? (As opposed to the GP specific likelihood options, like conditional or marginal_likelihood.) I’m relatively new to GPs so I’m asking as a learning opportunity not poking holes in your analysis.

@jbuddy_13 Thanks for the question!

For my particular problem, I wanted to fit the model to a time series of counts, so I needed a non-negative integer-spaced likelihood. The Latent GP model allowed me to use a Gaussian Process as my underlying latent function, and generalize using my chosen discrete probability distribution (Negative Binomial). On the other hand, the Marginal GP model makes the assumption that the noise on the underlying Gaussian Process is normally distributed, so it didn’t offer me the flexibility I needed to customize the model for my data.

These are the 2 main GP implementations PyMC3 offers. My understanding of these models comes from their descriptions in the docs:

Because I chose to use a separate distribution on top of the Latent GP, I didn’t need the built-in conditional and marginal_likelihood methods. I used the prior method to initialize the latent GP function, and the Negative Binomial distribution to evaluate the model’s fit to my data.

1 Like

Ah, I see- thank you!