GP model and prior prediction analysis

Hi,

My first steps of using pymc3 were very Successful. However, now I got into the more complicated aspects of Bayes using pymc3.

I aim to train a classifier (using the Binomial model), six predictors, and a gaussian process for integrating the information from a distance map.

For assessing the feasibility of the project, I have genreated training data:
predictors:

x = np.array((0.06,.01,.02,0.7,0.9,0.8))

Output:
corresponding to six classes:

y =  np.array((0, 0, 0, 1, 1, 1)) 

Extra information is extracted from the distance map (column are corresponidng to the order of the predictors and output:

Dmat = np.array([[0,1,2,8,3,7],
                             [1,0,3,9,4,7],
                             [2,2,0,5,3,4],
		             [8,9,5,0,2,1],
		             [3,4,3,2,0,2],
		             [7,7,4,1,2,0]]) 

I will need some help to reverse engineer the model for troubleshooting.

with pm.Model() as gauess_sim_test:
    # hyper-priors
    
    etasq = pm.Exponential("etasq", 10)
    ls_inv = pm.HalfNormal("ls_inv", 1)
    rhosq = pm.Deterministic("rhosq", 0.5 * ls_inv**2)

    # Implementation with PyMC's GP module:
    cov = rhosq **2 * pm.gp.cov.ExpQuad(input_dim=1, ls_inv=ls_inv)
    gp = pm.gp.Latent(cov_func=cov)
    k = gp.prior("k", X=Dmat)
    
    p = pm.Deterministic("p", pm.invlogit(k*x))
    prob = pm.Binomial("prob", 1, p, observed=y)
    
    step = pm.Metropolis()
    trace = pm.sample(
        1, step, progressbar=True
    )  # Make sure not to draw too many samples

I want to apply the cov function to select the most probable priors. It is unclear how to retrieve the output of the function (pm.gp.cov.ExpQuad) ?
In addition, the next step is not clear to me…what is the output of pm.gp.Latent? In the last GP step, priors are drawn from the multiplication of the calculated cov matrix and the distance matrix (X). Is it correct?

The model was built on the base of the Statistical rethinking book (which I love), second edition page 469 (Tools example), and the awesome GitHub support of the book (pymc-resources/Chp_14.ipynb at main · pymc-devs/pymc-resources · GitHub) code 14.39.

I hope I was clear, and thank you for this excellent forum.

how to retrieve the output of the function (pm.gp.cov.ExpQuad) ?

import pymc as pm
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)

etasq = 1
cov = etasq * pm.gp.cov.ExpQuad(1, ls=2)

# .eval() evaluates the aesara op.  There will be a pause the first time as aesara compiles the function
Kxx = cov(x[:, None]).eval()  

plt.imshow(Kxx);

what is the output of pm.gp.Latent?

Just the GP class. The actual GP you’re probably interested in comes from the call to gp.prior.

In the last GP step, priors are drawn from the multiplication of the calculated cov matrix and the distance matrix (X). Is it correct?

No, not quite… I think you’d say that a GP is the prior over your function k. Samples from the prior are drawn from a multivariate normal with mean 0 and covariance function cov.

Small things:

  • I think you’re using rhosq as the scale parameter in cov when it should be etasq instead.
  • Would def recommend sticking with the default NUTS as your sampler here. Metropolis without a carefully tuned proposal distribution will not sample a GP well.
1 Like