GP Latent implmenentation in PyMC

I am trying to understand how latent GP are implemented in PyMC, my immediate goal is to reproduce the computations shown here with PyMV version 5.0.

After some minor syntax changes from PyMC3 to PyMC, as the full latent model samples (which aparently will take a while), I started playing with this example, mentioned in this post..

I tried to “translate” it to PyMC as follows:

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import pymc as pm
#import theano
#import theano.tensor as tt

np.random.seed(400)

# draw sample from GP (f_true) as the true poisson mean (mu_true) as a function of x_true
l_true = 1.0
eta_true = 1.0
x_true = np.linspace(0, 15, 900)
cov = eta_true**2 * pm.gp.cov.ExpQuad(1, l_true)
f_true = np.random.multivariate_normal(np.zeros(len(x_true)), cov(x_true[:,None]).eval(), 1).flatten()
mu_true = np.exp(f_true)
# make data set with different density of observed data in different places
idx1 = np.random.randint(0,250,250)
x1 = x_true[idx1]
y1 = np.random.poisson(mu_true[idx1])

idx2 = np.random.randint(250,350,10) 
x2 = x_true[idx2]
y2 = np.random.poisson(mu_true[idx2])

idx3 = np.random.randint(500,700,180) 
x3 = x_true[idx3]
y3 = np.random.poisson(mu_true[idx3])

idx4 = np.random.randint(700,900,10) 
x4 = x_true[idx4]
y4 = np.random.poisson(mu_true[idx4])

x = np.concatenate((x1, x2, x3, x4), axis=0)
idx = np.argsort(x)
x = x[idx]
y = np.concatenate((y1, y2, y3, y4), axis=0)[idx]

plt.figure(figsize=(16,6))
plt.plot(x_true, mu_true, 'y', label="True GP (Poisson mean)")
plt.plot(x, y, 'ko', label="Observed data", alpha=0.7);
plt.legend();

n = len(x)
X = x[:,None]

print("Number of data points:", n)

with pm.Model() as model:
    l = pm.Gamma("l", alpha=2, beta=1)
    eta = pm.HalfNormal("eta", sigma=1)
    cov = eta**2 * pm.gp.cov.ExpQuad(1, l)
    gp = pm.gp.Latent(cov_func=cov)
    
    f = gp.prior("f", X=X)
    
#    y_ = pm.Poisson("y", mu=tt.exp(f), observed=y)
    y_ = pm.Poisson("y", mu=pm.math.exp(f), observed=y)

    
    
    tr = pm.sample(1000, chains=1)

## so far so good

Xnew = np.linspace(-1, 20, 200)[:,None]

with model:
    fnew = gp.conditional("fnew", Xnew=Xnew)
    full_preds = pm.sample_posterior_predictive(tr, vars=[fnew])

and here I get the error

ValueError: Variable name fnew already exists.

which leaves me baffled (sorry i am total beginner here), as the whole workflow seems to me identical to some GP examples I found, e.g. first link above, " Homoskedastic GP"section.

Any hints please? Thanks a lot

If you run

with model:
    fnew = gp.conditional("fnew", Xnew=Xnew)
    full_preds = pm.sample_posterior_predictive(tr, vars=[fnew])

twice you should trigger that error. The first line adds “fnew” to the model. Once it is a part of the model, you cannot add another “new” variable with the same name as an existing variable.

If you are programming in a jupyter notebook, this sort of mistake is easy to make because there is no rigid order of execution on the cells. Simply restart your notebook (which clears variable definitions) and re-execute and you should be good. More generally, in notebooks, I like to separate out those two lines different cells.

with model:
    fnew = gp.conditional("fnew", Xnew=Xnew)
with model
    full_preds = pm.sample_posterior_predictive(tr, vars=[fnew])

to ensure that I can repeatedly run posterior_predictive while running fnew only once. Lemme know if my hunch is correct!

1 Like

Thank you very much for helping out.
I am running on a notebook, but I did restart the kernel and run the cells in succession many times.
I had some issues with notebooks on VS code (I am on Mac OS), so I tried running as script, I get the error

KeyError: fnew

I wonder if the issues is in the line

with model
    full_preds = pm.sample_posterior_predictive(tr, vars=[fnew])

which should be

with model
    full_preds = pm.sample_posterior_predictive(tr, var_names=["fnew"])

where “fnew” is the string name associated to fnew?
Testing now, will report back. Many thanks