# GP and MC integral approximation - which logp to use?

Hi there everyone,
I am a little confused about how “logp” works for model object with pymc3. I have a simple model for dataset y_1, \dots, y_n given by y \sim GP(0, k_\theta(x,x')) . I am trying to approximate the integral

p(y_1, \dots, y_n) = \int_\theta p(y_1, \dots, y_n|\theta)p(\theta)d\theta

using the MC method, i.e I need to evaluate the likelihood p(y_1, \dots, y_n|\theta) and I am a little confused how to get what I am looking. My model is:

alpha = 2
priors = {"eta": ["gamma", alpha, 1],
"rho": ["gamma", alpha, 1]}
with pm.Model() as gp_mass:

eta1 = pm.Gamma("eta1", alpha = priors["eta"], beta = priors["eta"], transform=None)
rho1 = pm.Gamma("rho1", alpha = priors["rho"], beta = priors["rho"], shape = 2, transform=None)

# gp defintion
# mean
gp_mean_1 = pm.gp.mean.Zero()

# cov
gp_cov_1 = eta1 ** 2 * pm.gp.cov.ExpQuad(X1.shape, ls = rho1, active_dims = [1, 2])

# Gp def
gp_model_1 = pm.gp.Marginal(mean_func = gp_mean_1, cov_func = gp_cov_1)
y_gp_1 = gp_model_1.marginal_likelihood("y1", X = X1, y = y1, noise = 0.01, transform=None)


I am having “transform = None” everywhere because in https://github.com/pymc-devs/pymc3/issues/789 it says that that is the way to get the correct likelihood evaluation (I need it exactly not up to a constant). This is how I approximate the integral

n_mc = 10000
np.random.seed(0)

with pm.Model() as priors_model:

if priors["eta"] == "gamma":
eta1 = pm.Gamma("eta1", alpha = priors["eta"], beta = priors["eta"], transform=None)

if priors["rho"] == "gamma":
rho1 = pm.Gamma("rho1", alpha = priors["rho"], beta = priors["rho"], shape = 2, transform=None)

trace_priors = pm.sample(n_mc, tune = 10000, chains = 1)

log_likelihood1 = np.empty(0)
mc_integral1 = np.empty(n_mc)

log_likelihood2 = np.empty(0)
mc_integral2 = np.empty(n_mc)

logp_1 = y_gp_1.logp
logp = gp_mass.logp

for i in tqdm(range(n_mc), desc = "Log likelihood eval"):
log_likelihood1 = np.append(log_likelihood1, logp_1(trace_priors[i], transfrom = None))
log_likelihood2 = np.append(log_likelihood2, logp(trace_priors[i], transfrom = None))

for i in tqdm(range(n_mc), desc = "Integral calc"):
mc_integral1[i] = (np.sum(np.exp(log_likelihood1[:(i + 1)]))) / (i + 1)
mc_integral2[i] = (np.sum(np.exp(log_likelihood2[:(i + 1)]))) / (i + 1)


I am getting different results based on gp_mass.logp and y_gp_1.logp which is maybe not all that surprising but which of these densities is the log of p(y_1, \dots, y_n|\theta)?

Thanks!

I see, based on https://docs.pymc.io/developer_guide.html#logp-and-dlogp, y_gp_1.logp is the one I am looking for. Am I right?

I think it should be gp_mass.logp, as your integral has the priors of the parameters in it that you need to also include.

Also, marginal logp is difficult to compute, you can have a look of Bridge sampling