How to calculate log posterior of a GP over a trace in pymc3


#1

Use case

Suppose I have an observation y_0 at X_0 which I’d like to model with a Gaussian process with hyper params theta. Suppose I then determine a distribution in the hyper params theta by hierarchically sampling the marginal.

Now, I’d like to evaluate the log posterior probability of another observation say y_1 at X_1, averaged over the hyper param distribution,

E_theta [ log P(y_1 | y_0, X_0, X_1, theta) ]

Ideally, I’d draw from the posterior in theta and calculate log P(y_1 | y_0, X_0, X_1, theta) and then take the geometric mean.

Question

In pymc3 is there a way to create the tensor representing log P(y_1 | y_0 X_0 X_1 theta). Ideally, I would do something like (copy-pastable),

import numpy as np
import pylab as plt
import pymc3 as pm

# Data generation
X0 = np.linspace(0, 10, 100)[:,None]
y0 = X0**(0.5) + np.exp(-X0/5)*np.sin(10*X0)
y0 += 0.1*np.random.normal(size=y0.shape)
y0 = np.squeeze(y0)

# y1
X1 = np.linspace(0, 15, 200)[:,None]
y1 = X1**(0.5) + np.exp(-X1/6)*np.sin(8*X1)
y1 = np.squeeze(y1)

with pm.Model() as model:
    l = pm.HalfNormal('l',5.)
    cov_func = pm.gp.cov.ExpQuad(1, ls=l)
    gp = pm.gp.Marginal(cov_func=cov_func)
    y0_ = gp.marginal_likelihood('y0',X0,y0,0.1)
    # I think given is not needed because it should be cached from above
    y1_ = gp.conditional('y1',X1,given={'X':X0,'y':y0,'noise':0.1})
    logp = pm.Deterministic('logp',y1_.logp(y1))
    trace = pm.sample(100)

This produce the traceback:

TypeError: can't turn [array([...]) and {} into a dict. cannot convert dictionary update sequence element #0 to a sequence

Any help greatly appreciated!


#2

Yes you can do it this way to save the logp into the trace, otherwise you can evaluate it outside of the model by creating/accessing the right function.

Some related discussion see Frequently Asked Questions

So you can do:

with model:
     logp_y1_ = pm.Deterministic('logp', y1_.logpt)

then do sample.

Or you can sample first, and then evaluate the logp of y1_ (notice that I also move the conditional outside, so the trace is smaller):

with pm.Model() as model:
    l = pm.HalfNormal('l',5.)
    cov_func = pm.gp.cov.ExpQuad(1, ls=l)
    gp = pm.gp.Marginal(cov_func=cov_func)
    y0_ = gp.marginal_likelihood('y0',X0,y0,0.1)
    trace = pm.sample(100)

with model:
    y1_ = gp.conditional('y1',X1,given={'X':X0,'y':y0,'noise':0.1})
# evaluate logp on one input from the posterior:
point = trace[0] # get a point from trace
point['y1'] = y1
y1_.logp(point)

#3

Hmm, actually the first way logp_y1_ = pm.Deterministic('logp', y1_.logpt) is not quite right as it does not evaluate on the new observed y1


#4

Yes I wonder how that could be made right. It would be useful to do during the sampling.


#5

I think this should do the trick:

with model:
    ...
    y1_ = gp.conditional('y1',X1,given={'X':X0,'y':y0,'noise':0.1}, testval=y1)
    logp = pm.Deterministic('logp',y1_.logpt)
    ...

Explanation: If the value is not provided, logpt evaluated over the test value (default to 0 in this case). But just to be sure please verify it…


#6

I tried:

with pm.Model() as model:
    l = pm.HalfNormal('l',5.)
    cov_func = pm.gp.cov.ExpQuad(1, ls=l)
    gp = pm.gp.Marginal(cov_func=cov_func)
    y0_ = gp.marginal_likelihood('y0',X0,y0,0.1)
    y1_ = gp.conditional('y1',X1,given={'X':X0,'y':y0,'noise':0.1},testval=y1)
    logp = pm.Deterministic('logp',y1_.logpt)
    trace = pm.sample(100)

With that I get that it’s trying to sample y1 too then the error at pm.sample:

TypeError: can't pickle fortran objects


#7

Try setting njobs=1


#8

It now samples but I’m still verifying that it is doing the right thing. For instance it says that it is sampling y1, when it shouldn’t be performing any sampling (unless it’s drawing from testval all the time), and it’s extremely slow compared to simply cycling over trace points and trying method 2.


#9

It is actually sampling y1: if you put it in the model, basically it is doing prediction with pi(ynew | Xnew, X, y, theta) and saved the prediction in the trace.


#10

That makes sense, but then y1 should always be sampling the same exact point given by testval. Upon inspecting the trace I find that y1 is taking on different values.


#11

I’m not positive this is right, (@junpenglao), but I think you could do this:

with pm.Model() as model:
    l = pm.HalfNormal('l',5.)
    cov_func = pm.gp.cov.ExpQuad(1, ls=l)
    gp = pm.gp.Marginal(cov_func=cov_func)
    y0_ = gp.marginal_likelihood('y0',X0,y0,0.1)
    tr = pm.sample(100, chains=1)  # faster if you're using multithreaded blas

# don't include y1_ with the NUTS sampling if you don't need to
with model:
    y1_ = gp.conditional('y1', X1)
    logp = pm.Deterministic('logp', y1_.logp(y1))

# sample y1 using the trace
with model: 
    ppc = pm.sample_ppc(tr, 100, vars=[y1_])

Then, (I think this is right) you can get logp and y1 for, in this case, the first sample in the trace by doing

logp_val, y1_sample = pm.distributions.draw_values([logp, y1_], tr[0])

#12

using draw_values is a nice touch :wink: Small edit so it would work (tested on master):

import theano
ynew = theano.shared(y1)
with model:
    y1_ = gp.conditional('y1', X1)
    logp = pm.Deterministic('logp', y1_.distribution.logp(ynew))

logp_val, y1_sample = pm.distributions.draw_values([logp, y1_], tr[0])

The output is the same as doing below (from above reply):

point = tr[0] # get a point from trace
point['y1'] = y1
logp_val2 = y1_.logp(point)

#13

Nice @bwengals. It’s close except logp = pm.Deterministic('logp', y1_.logp(y1)) needs to be logp = pm.Deterministic('logp', y1_.logpt) to work for me, and then it will only evaluate logp at the conditioned y1. I want something slightly different and to evaluate logp at a given y1 (which in my case is the true underlying parameter since it is known from simulation - it gives a measure of ability of GP to reconstruct the underlying). I think @junpenglao has it with enforcing the shared tensor variable. I need to verify but if it is returning the same thing as:

point = tr[0] # get a point from trace
point['y1'] = y1
logp_val2 = y1_.logp(point)

then it should be want I’m looking for. I’ll verify and get back.


#14

Okay so I’m satisfied with the solution. Here is the code demonstrating the equivalence and timing results. Method 2 (see below) is fastest by about 2-3 times.

import numpy as np
import pylab as plt
import pymc3 as pm
from theano import shared
from timeit import default_timer
plt.style.use('ggplot')

# Data generation
X0 = np.linspace(0, 10, 100)[:,None]
y0 = X0**(0.5) + np.exp(-X0/5)*np.sin(10*X0)
y0 += 0.1*np.random.normal(size=y0.shape)
y0 = np.squeeze(y0)

# y1
X1 = np.linspace(0, 15, 200)[:,None]
y1 = X1**(0.5) + np.exp(-X1/5)*np.sin(10*X1)
y1 = np.squeeze(y1)

plt.scatter(X0,y0)
plt.plot(X1,y1)
plt.show()

with pm.Model() as model:
    l = pm.HalfNormal('l',5.)
    cov_func = pm.gp.cov.ExpQuad(1, ls=l,)
    gp = pm.gp.Marginal(cov_func=cov_func)
    y0_ = gp.marginal_likelihood('y0',X0,y0,0.1)
    trace = pm.sample(1000,chains=4)
pm.traceplot(trace)
plt.show()

with model:
    y1_ = gp.conditional('y1',X1)#,given={'X':X0,'y':y0,'noise':0.1})
    
###
# Method 1

t1 = default_timer()
y1_shr = shared(y1)
with model:
    logp = pm.Deterministic('logp', y1_.distribution.logp(y1_shr))
logp_val2 = [pm.distributions.draw_values([logp], point) for point in trace]
print(len(trace)/(default_timer()-t1))
# prints 61

###
# Method 2
t1 = default_timer()
logp = y1_.logp
logp_val1 = []
for point in trace:
    point['y1'] = y1
    logp_val1.append(logp(point))
print(len(trace)/(default_timer()-t1))
# prints 162

fig,axs = plt.subplots(2)
axs[0].hist(np.ravel(logp_val1),alpha=0.5)
axs[1].hist(np.ravel(logp_val2),alpha=0.5)
plt.show()

assert np.all(np.squeeze(logp_val2)==np.squeeze(logp_val1))
###
# note logp_val1 is list of list of arrays, and logp_val2 is list of arrays
# hence the squeeze

#15

Nice! Thanks for the summary.