GP and MC integral approximation - which logp to use?

Hi there everyone,
I am a little confused about how “logp” works for model object with pymc3. I have a simple model for dataset y_1, \dots, y_n given by y \sim GP(0, k_\theta(x,x')) . I am trying to approximate the integral

p(y_1, \dots, y_n) = \int_\theta p(y_1, \dots, y_n|\theta)p(\theta)d\theta

using the MC method, i.e I need to evaluate the likelihood p(y_1, \dots, y_n|\theta) and I am a little confused how to get what I am looking. My model is:

alpha = 2
priors = {"eta": ["gamma", alpha, 1],
          "rho": ["gamma", alpha, 1]}
with pm.Model() as gp_mass:
    
    eta1 = pm.Gamma("eta1", alpha = priors["eta"][1], beta = priors["eta"][2], transform=None)
    rho1 = pm.Gamma("rho1", alpha = priors["rho"][1], beta = priors["rho"][2], shape = 2, transform=None)
        
    # gp defintion
    # mean
    gp_mean_1 = pm.gp.mean.Zero()

    # cov
    gp_cov_1 = eta1 ** 2 * pm.gp.cov.ExpQuad(X1.shape[1], ls = rho1, active_dims = [1, 2])
    
    # Gp def
    gp_model_1 = pm.gp.Marginal(mean_func = gp_mean_1, cov_func = gp_cov_1)
    y_gp_1 = gp_model_1.marginal_likelihood("y1", X = X1, y = y1, noise = 0.01, transform=None)

I am having “transform = None” everywhere because in https://github.com/pymc-devs/pymc3/issues/789 it says that that is the way to get the correct likelihood evaluation (I need it exactly not up to a constant). This is how I approximate the integral

n_mc = 10000
np.random.seed(0)
   
with pm.Model() as priors_model:
    
    if priors["eta"][0] == "gamma":
        eta1 = pm.Gamma("eta1", alpha = priors["eta"][1], beta = priors["eta"][2], transform=None)
    
    if priors["rho"][0] == "gamma":
        rho1 = pm.Gamma("rho1", alpha = priors["rho"][1], beta = priors["rho"][2], shape = 2, transform=None)
        
    trace_priors = pm.sample(n_mc, tune = 10000, chains = 1)
    
log_likelihood1 = np.empty(0)
mc_integral1 = np.empty(n_mc)

log_likelihood2 = np.empty(0)
mc_integral2 = np.empty(n_mc)
    
logp_1 = y_gp_1.logp
logp = gp_mass.logp

for i in tqdm(range(n_mc), desc = "Log likelihood eval"):
    log_likelihood1 = np.append(log_likelihood1, logp_1(trace_priors[i], transfrom = None))
    log_likelihood2 = np.append(log_likelihood2, logp(trace_priors[i], transfrom = None))

for i in tqdm(range(n_mc), desc = "Integral calc"):
    mc_integral1[i] = (np.sum(np.exp(log_likelihood1[:(i + 1)]))) / (i + 1)  
    mc_integral2[i] = (np.sum(np.exp(log_likelihood2[:(i + 1)]))) / (i + 1)

I am getting different results based on gp_mass.logp and y_gp_1.logp which is maybe not all that surprising but which of these densities is the log of p(y_1, \dots, y_n|\theta)?

Thanks!

I see, based on https://docs.pymc.io/developer_guide.html#logp-and-dlogp, y_gp_1.logp is the one I am looking for. Am I right?

I think it should be gp_mass.logp, as your integral has the priors of the parameters in it that you need to also include.

Also, marginal logp is difficult to compute, you can have a look of Bridge sampling