How to calculate log posterior of a GP over a trace in pymc3

Joshuaalbert · February 1, 2018, 9:32pm

Use case

Suppose I have an observation y_0 at X_0 which I’d like to model with a Gaussian process with hyper params theta. Suppose I then determine a distribution in the hyper params theta by hierarchically sampling the marginal.

Now, I’d like to evaluate the log posterior probability of another observation say y_1 at X_1, averaged over the hyper param distribution,

E_theta [ log P(y_1 | y_0, X_0, X_1, theta) ]

Ideally, I’d draw from the posterior in theta and calculate log P(y_1 | y_0, X_0, X_1, theta) and then take the geometric mean.

Question

In pymc3 is there a way to create the tensor representing log P(y_1 | y_0 X_0 X_1 theta). Ideally, I would do something like (copy-pastable),

import numpy as np
import pylab as plt
import pymc3 as pm

# Data generation
X0 = np.linspace(0, 10, 100)[:,None]
y0 = X0**(0.5) + np.exp(-X0/5)*np.sin(10*X0)
y0 += 0.1*np.random.normal(size=y0.shape)
y0 = np.squeeze(y0)

# y1
X1 = np.linspace(0, 15, 200)[:,None]
y1 = X1**(0.5) + np.exp(-X1/6)*np.sin(8*X1)
y1 = np.squeeze(y1)

with pm.Model() as model:
    l = pm.HalfNormal('l',5.)
    cov_func = pm.gp.cov.ExpQuad(1, ls=l)
    gp = pm.gp.Marginal(cov_func=cov_func)
    y0_ = gp.marginal_likelihood('y0',X0,y0,0.1)
    # I think given is not needed because it should be cached from above
    y1_ = gp.conditional('y1',X1,given={'X':X0,'y':y0,'noise':0.1})
    logp = pm.Deterministic('logp',y1_.logp(y1))
    trace = pm.sample(100)

This produce the traceback:

TypeError: can't turn [array([...]) and {} into a dict. cannot convert dictionary update sequence element #0 to a sequence

Any help greatly appreciated!

junpenglao · February 1, 2018, 10:08pm

Yes you can do it this way to save the logp into the trace, otherwise you can evaluate it outside of the model by creating/accessing the right function.

Some related discussion see Frequently Asked Questions

So you can do:

with model:
     logp_y1_ = pm.Deterministic('logp', y1_.logpt)

then do sample.

Or you can sample first, and then evaluate the logp of y1_ (notice that I also move the conditional outside, so the trace is smaller):

with pm.Model() as model:
    l = pm.HalfNormal('l',5.)
    cov_func = pm.gp.cov.ExpQuad(1, ls=l)
    gp = pm.gp.Marginal(cov_func=cov_func)
    y0_ = gp.marginal_likelihood('y0',X0,y0,0.1)
    trace = pm.sample(100)

with model:
    y1_ = gp.conditional('y1',X1,given={'X':X0,'y':y0,'noise':0.1})
# evaluate logp on one input from the posterior:
point = trace[0] # get a point from trace
point['y1'] = y1
y1_.logp(point)

junpenglao · February 1, 2018, 10:18pm

Hmm, actually the first way logp_y1_ = pm.Deterministic('logp', y1_.logpt) is not quite right as it does not evaluate on the new observed y1…

Joshuaalbert · February 1, 2018, 10:18pm

Yes I wonder how that could be made right. It would be useful to do during the sampling.

junpenglao · February 1, 2018, 10:23pm

I think this should do the trick:

with model:
    ...
    y1_ = gp.conditional('y1',X1,given={'X':X0,'y':y0,'noise':0.1}, testval=y1)
    logp = pm.Deterministic('logp',y1_.logpt)
    ...

Explanation: If the value is not provided, logpt evaluated over the test value (default to 0 in this case). But just to be sure please verify it…

Joshuaalbert · February 1, 2018, 10:34pm

I tried:

with pm.Model() as model:
    l = pm.HalfNormal('l',5.)
    cov_func = pm.gp.cov.ExpQuad(1, ls=l)
    gp = pm.gp.Marginal(cov_func=cov_func)
    y0_ = gp.marginal_likelihood('y0',X0,y0,0.1)
    y1_ = gp.conditional('y1',X1,given={'X':X0,'y':y0,'noise':0.1},testval=y1)
    logp = pm.Deterministic('logp',y1_.logpt)
    trace = pm.sample(100)

With that I get that it’s trying to sample y1 too then the error at pm.sample:

TypeError: can't pickle fortran objects

junpenglao · February 1, 2018, 10:38pm

Try setting njobs=1

Joshuaalbert · February 1, 2018, 10:44pm

It now samples but I’m still verifying that it is doing the right thing. For instance it says that it is sampling y1, when it shouldn’t be performing any sampling (unless it’s drawing from testval all the time), and it’s extremely slow compared to simply cycling over trace points and trying method 2.

junpenglao · February 1, 2018, 10:47pm

It is actually sampling y1: if you put it in the model, basically it is doing prediction with pi(ynew | Xnew, X, y, theta) and saved the prediction in the trace.

Joshuaalbert · February 1, 2018, 11:14pm

That makes sense, but then y1 should always be sampling the same exact point given by testval. Upon inspecting the trace I find that y1 is taking on different values.

bwengals · February 2, 2018, 1:19am

I’m not positive this is right, (@junpenglao), but I think you could do this:

with pm.Model() as model:
    l = pm.HalfNormal('l',5.)
    cov_func = pm.gp.cov.ExpQuad(1, ls=l)
    gp = pm.gp.Marginal(cov_func=cov_func)
    y0_ = gp.marginal_likelihood('y0',X0,y0,0.1)
    tr = pm.sample(100, chains=1)  # faster if you're using multithreaded blas

# don't include y1_ with the NUTS sampling if you don't need to
with model:
    y1_ = gp.conditional('y1', X1)
    logp = pm.Deterministic('logp', y1_.logp(y1))

# sample y1 using the trace
with model: 
    ppc = pm.sample_ppc(tr, 100, vars=[y1_])

Then, (I think this is right) you can get logp and y1 for, in this case, the first sample in the trace by doing

logp_val, y1_sample = pm.distributions.draw_values([logp, y1_], tr[0])

junpenglao · February 2, 2018, 10:53am

using draw_values is a nice touch Small edit so it would work (tested on master):

import theano
ynew = theano.shared(y1)
with model:
    y1_ = gp.conditional('y1', X1)
    logp = pm.Deterministic('logp', y1_.distribution.logp(ynew))

logp_val, y1_sample = pm.distributions.draw_values([logp, y1_], tr[0])

The output is the same as doing below (from above reply):

point = tr[0] # get a point from trace
point['y1'] = y1
logp_val2 = y1_.logp(point)

Joshuaalbert · February 2, 2018, 11:15am

Nice @bwengals. It’s close except logp = pm.Deterministic('logp', y1_.logp(y1)) needs to be logp = pm.Deterministic('logp', y1_.logpt) to work for me, and then it will only evaluate logp at the conditioned y1. I want something slightly different and to evaluate logp at a given y1 (which in my case is the true underlying parameter since it is known from simulation - it gives a measure of ability of GP to reconstruct the underlying). I think @junpenglao has it with enforcing the shared tensor variable. I need to verify but if it is returning the same thing as:

point = tr[0] # get a point from trace
point['y1'] = y1
logp_val2 = y1_.logp(point)

then it should be want I’m looking for. I’ll verify and get back.

Joshuaalbert · February 2, 2018, 1:13pm

Okay so I’m satisfied with the solution. Here is the code demonstrating the equivalence and timing results. Method 2 (see below) is fastest by about 2-3 times.

import numpy as np
import pylab as plt
import pymc3 as pm
from theano import shared
from timeit import default_timer
plt.style.use('ggplot')

# Data generation
X0 = np.linspace(0, 10, 100)[:,None]
y0 = X0**(0.5) + np.exp(-X0/5)*np.sin(10*X0)
y0 += 0.1*np.random.normal(size=y0.shape)
y0 = np.squeeze(y0)

# y1
X1 = np.linspace(0, 15, 200)[:,None]
y1 = X1**(0.5) + np.exp(-X1/5)*np.sin(10*X1)
y1 = np.squeeze(y1)

plt.scatter(X0,y0)
plt.plot(X1,y1)
plt.show()

with pm.Model() as model:
    l = pm.HalfNormal('l',5.)
    cov_func = pm.gp.cov.ExpQuad(1, ls=l,)
    gp = pm.gp.Marginal(cov_func=cov_func)
    y0_ = gp.marginal_likelihood('y0',X0,y0,0.1)
    trace = pm.sample(1000,chains=4)
pm.traceplot(trace)
plt.show()

with model:
    y1_ = gp.conditional('y1',X1)#,given={'X':X0,'y':y0,'noise':0.1})
    
###
# Method 1

t1 = default_timer()
y1_shr = shared(y1)
with model:
    logp = pm.Deterministic('logp', y1_.distribution.logp(y1_shr))
logp_val2 = [pm.distributions.draw_values([logp], point) for point in trace]
print(len(trace)/(default_timer()-t1))
# prints 61

###
# Method 2
t1 = default_timer()
logp = y1_.logp
logp_val1 = []
for point in trace:
    point['y1'] = y1
    logp_val1.append(logp(point))
print(len(trace)/(default_timer()-t1))
# prints 162

fig,axs = plt.subplots(2)
axs[0].hist(np.ravel(logp_val1),alpha=0.5)
axs[1].hist(np.ravel(logp_val2),alpha=0.5)
plt.show()

assert np.all(np.squeeze(logp_val2)==np.squeeze(logp_val1))
###
# note logp_val1 is list of list of arrays, and logp_val2 is list of arrays
# hence the squeeze

junpenglao · February 2, 2018, 3:47pm

Nice! Thanks for the summary.

Topic		Replies	Views
GP and MC integral approximation - which logp to use? Questions	2	559	February 12, 2019
Bayesian calibration on GP model v3 gaussian_process	7	1082	July 23, 2022
Gaussian Process predict over trace Questions	2	811	September 4, 2018
Bayesian Analysis of Posterior Predictions from GP model Questions	17	1333	August 10, 2020
Evaluating log posterior (or log prior/log likelihood) of model at arbitrary inputs v5 development	1	489	August 29, 2022

How to calculate log posterior of a GP over a trace in pymc3

Use case

Question

Related topics