Hi there everyone,
I am a little confused about how “logp” works for model object with pymc3. I have a simple model for dataset y_1, \dots, y_n given by y \sim GP(0, k_\theta(x,x')) . I am trying to approximate the integral
p(y_1, \dots, y_n) = \int_\theta p(y_1, \dots, y_n|\theta)p(\theta)d\theta
using the MC method, i.e I need to evaluate the likelihood p(y_1, \dots, y_n|\theta) and I am a little confused how to get what I am looking. My model is:
alpha = 2
priors = {"eta": ["gamma", alpha, 1],
"rho": ["gamma", alpha, 1]}
with pm.Model() as gp_mass:
eta1 = pm.Gamma("eta1", alpha = priors["eta"][1], beta = priors["eta"][2], transform=None)
rho1 = pm.Gamma("rho1", alpha = priors["rho"][1], beta = priors["rho"][2], shape = 2, transform=None)
# gp defintion
# mean
gp_mean_1 = pm.gp.mean.Zero()
# cov
gp_cov_1 = eta1 ** 2 * pm.gp.cov.ExpQuad(X1.shape[1], ls = rho1, active_dims = [1, 2])
# Gp def
gp_model_1 = pm.gp.Marginal(mean_func = gp_mean_1, cov_func = gp_cov_1)
y_gp_1 = gp_model_1.marginal_likelihood("y1", X = X1, y = y1, noise = 0.01, transform=None)
I am having “transform = None” everywhere because in https://github.com/pymc-devs/pymc3/issues/789 it says that that is the way to get the correct likelihood evaluation (I need it exactly not up to a constant). This is how I approximate the integral
n_mc = 10000
np.random.seed(0)
with pm.Model() as priors_model:
if priors["eta"][0] == "gamma":
eta1 = pm.Gamma("eta1", alpha = priors["eta"][1], beta = priors["eta"][2], transform=None)
if priors["rho"][0] == "gamma":
rho1 = pm.Gamma("rho1", alpha = priors["rho"][1], beta = priors["rho"][2], shape = 2, transform=None)
trace_priors = pm.sample(n_mc, tune = 10000, chains = 1)
log_likelihood1 = np.empty(0)
mc_integral1 = np.empty(n_mc)
log_likelihood2 = np.empty(0)
mc_integral2 = np.empty(n_mc)
logp_1 = y_gp_1.logp
logp = gp_mass.logp
for i in tqdm(range(n_mc), desc = "Log likelihood eval"):
log_likelihood1 = np.append(log_likelihood1, logp_1(trace_priors[i], transfrom = None))
log_likelihood2 = np.append(log_likelihood2, logp(trace_priors[i], transfrom = None))
for i in tqdm(range(n_mc), desc = "Integral calc"):
mc_integral1[i] = (np.sum(np.exp(log_likelihood1[:(i + 1)]))) / (i + 1)
mc_integral2[i] = (np.sum(np.exp(log_likelihood2[:(i + 1)]))) / (i + 1)
I am getting different results based on gp_mass.logp
and y_gp_1.logp
which is maybe not all that surprising but which of these densities is the log of p(y_1, \dots, y_n|\theta)?
Thanks!