GP Latent implmenentation in PyMC

Michael · February 22, 2025, 6:22pm

I am trying to understand how latent GP are implemented in PyMC, my immediate goal is to reproduce the computations shown here with PyMV version 5.0.

After some minor syntax changes from PyMC3 to PyMC, as the full latent model samples (which aparently will take a while), I started playing with this example, mentioned in this post..

I tried to “translate” it to PyMC as follows:

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import pymc as pm
#import theano
#import theano.tensor as tt

np.random.seed(400)

# draw sample from GP (f_true) as the true poisson mean (mu_true) as a function of x_true
l_true = 1.0
eta_true = 1.0
x_true = np.linspace(0, 15, 900)
cov = eta_true**2 * pm.gp.cov.ExpQuad(1, l_true)
f_true = np.random.multivariate_normal(np.zeros(len(x_true)), cov(x_true[:,None]).eval(), 1).flatten()
mu_true = np.exp(f_true)
# make data set with different density of observed data in different places
idx1 = np.random.randint(0,250,250)
x1 = x_true[idx1]
y1 = np.random.poisson(mu_true[idx1])

idx2 = np.random.randint(250,350,10) 
x2 = x_true[idx2]
y2 = np.random.poisson(mu_true[idx2])

idx3 = np.random.randint(500,700,180) 
x3 = x_true[idx3]
y3 = np.random.poisson(mu_true[idx3])

idx4 = np.random.randint(700,900,10) 
x4 = x_true[idx4]
y4 = np.random.poisson(mu_true[idx4])

x = np.concatenate((x1, x2, x3, x4), axis=0)
idx = np.argsort(x)
x = x[idx]
y = np.concatenate((y1, y2, y3, y4), axis=0)[idx]

plt.figure(figsize=(16,6))
plt.plot(x_true, mu_true, 'y', label="True GP (Poisson mean)")
plt.plot(x, y, 'ko', label="Observed data", alpha=0.7);
plt.legend();

n = len(x)
X = x[:,None]

print("Number of data points:", n)

with pm.Model() as model:
    l = pm.Gamma("l", alpha=2, beta=1)
    eta = pm.HalfNormal("eta", sigma=1)
    cov = eta**2 * pm.gp.cov.ExpQuad(1, l)
    gp = pm.gp.Latent(cov_func=cov)
    
    f = gp.prior("f", X=X)
    
#    y_ = pm.Poisson("y", mu=tt.exp(f), observed=y)
    y_ = pm.Poisson("y", mu=pm.math.exp(f), observed=y)

    
    
    tr = pm.sample(1000, chains=1)

## so far so good

Xnew = np.linspace(-1, 20, 200)[:,None]

with model:
    fnew = gp.conditional("fnew", Xnew=Xnew)
    full_preds = pm.sample_posterior_predictive(tr, vars=[fnew])

and here I get the error

ValueError: Variable name fnew already exists.

which leaves me baffled (sorry i am total beginner here), as the whole workflow seems to me identical to some GP examples I found, e.g. first link above, " Homoskedastic GP"section.

Any hints please? Thanks a lot

daniel-saunders-phil · February 22, 2025, 10:28pm

If you run

with model:
    fnew = gp.conditional("fnew", Xnew=Xnew)
    full_preds = pm.sample_posterior_predictive(tr, vars=[fnew])

twice you should trigger that error. The first line adds “fnew” to the model. Once it is a part of the model, you cannot add another “new” variable with the same name as an existing variable.

If you are programming in a jupyter notebook, this sort of mistake is easy to make because there is no rigid order of execution on the cells. Simply restart your notebook (which clears variable definitions) and re-execute and you should be good. More generally, in notebooks, I like to separate out those two lines different cells.

with model:
    fnew = gp.conditional("fnew", Xnew=Xnew)

with model
    full_preds = pm.sample_posterior_predictive(tr, vars=[fnew])

to ensure that I can repeatedly run posterior_predictive while running fnew only once. Lemme know if my hunch is correct!

Michael · February 23, 2025, 10:45am

Thank you very much for helping out.
I am running on a notebook, but I did restart the kernel and run the cells in succession many times.
I had some issues with notebooks on VS code (I am on Mac OS), so I tried running as script, I get the error

KeyError: fnew

I wonder if the issues is in the line

with model
    full_preds = pm.sample_posterior_predictive(tr, vars=[fnew])

which should be

with model
    full_preds = pm.sample_posterior_predictive(tr, var_names=["fnew"])

where “fnew” is the string name associated to fnew?
Testing now, will report back. Many thanks

daniel-saunders-phil · February 23, 2025, 5:36pm

Oh yes, you definitely need a string there.

Michael · February 23, 2025, 8:35pm

Thank you, I managed now, after another pair of glitches going from PyMC3 to PyMC (e.g.

full_preds.posterior_predictive["fnew"]).squeeze()

instead of

np.exp(full_preds["fnew"]

as in the script linked above), got it working now, thanks again!
Admittedly all pretty obvious, but I paste the script below should anybody hit the same difficulties

#!/usr/bin/env python
# coding: utf-8


import numpy as np
import matplotlib.pyplot as plt
#%matplotlib inline

import pymc as pm
#import theano
#import theano.tensor as tt

np.random.seed(400)

# draw sample from GP (f_true) as the true poisson mean (mu_true) as a function of x_true
l_true = 1.0
eta_true = 1.0
x_true = np.linspace(0, 15, 900)
cov = eta_true**2 * pm.gp.cov.ExpQuad(1, l_true)
f_true = np.random.multivariate_normal(np.zeros(len(x_true)), cov(x_true[:,None]).eval(), 1).flatten()
mu_true = np.exp(f_true)
# make data set with different density of observed data in different places
idx1 = np.random.randint(0,250,250)
x1 = x_true[idx1]
y1 = np.random.poisson(mu_true[idx1])

idx2 = np.random.randint(250,350,10) 
x2 = x_true[idx2]
y2 = np.random.poisson(mu_true[idx2])

idx3 = np.random.randint(500,700,180) 
x3 = x_true[idx3]
y3 = np.random.poisson(mu_true[idx3])

idx4 = np.random.randint(700,900,10) 
x4 = x_true[idx4]
y4 = np.random.poisson(mu_true[idx4])

x = np.concatenate((x1, x2, x3, x4), axis=0)
idx = np.argsort(x)
x = x[idx]
y = np.concatenate((y1, y2, y3, y4), axis=0)[idx]


plt.figure(figsize=(16,6))
plt.plot(x_true, mu_true, 'y', label="True GP (Poisson mean)")
plt.plot(x, y, 'ko', label="Observed data", alpha=0.7);
plt.legend();

n = len(x)
X = x[:,None]

print("Number of data points:", n)

with pm.Model() as model:
    l = pm.Gamma("l", alpha=2, beta=1)
    eta = pm.HalfNormal("eta", sigma=1)
    cov = eta**2 * pm.gp.cov.ExpQuad(1, l)
    gp = pm.gp.Latent(cov_func=cov)
    
    f = gp.prior("f", X=X)
    
#    y_ = pm.Poisson("y", mu=tt.exp(f), observed=y)
    y_ = pm.Poisson("y", mu=pm.math.exp(f), observed=y)
   
    
    tr = pm.sample(1000, chains=1)


Xnew = np.linspace(-1, 20, 200)[:,None]

with model:
    fnew = gp.conditional("fnew", Xnew=Xnew)
    full_preds = pm.sample_posterior_predictive(tr, var_names=["fnew"])


plt.figure(figsize=(16, 6))
ax = plt.gca()
pm.gp.util.plot_gp_dist(ax, np.exp(full_preds.posterior_predictive["fnew"]).squeeze(), Xnew.flatten(), palette="Blues")
plt.plot(X.flatten(), y, 'ok', label="Observed data");
plt.plot(x_true, mu_true, 'y', label="True GP at given X")
plt.ylim([-0.5, 14])
plt.plot(Xnew.flatten(), 1000+np.exp(full_preds.posterior_predictive["fnew"].T)[:,0], color="dodgerblue", label="Samples from conditional of full GP"); # plotted off screen for legend
plt.legend();

Topic		Replies	Views
How to Use PyMC Distributions as Inputs to GP.conditional v5 gaussian_process , modeling	3	518	December 14, 2022
Poisson likelihood over a Gaussian process Questions	4	1382	November 13, 2019
Inference without explicitly adding gp.conditional() when running sample_posterior_predictive	0	13	October 11, 2024
Latent gaussian process prediction problem Questions gaussian_process	3	1081	January 25, 2022
Multivariate Gaussian Process Classification (GPC) and Prediction Questions	2	901	February 23, 2020

GP Latent implmenentation in PyMC

Related topics