Regression using RBF kernel

Nadheesh · May 30, 2018, 6:57am

I’m trying to implement a RBF kernel for regression (similar to Kernel used in sklearn SVR). I want to capture non-linear relationships between the predictors and dependent variable.

gamma = 1
with pm.Model() as model:
    x_i = pm.Normal("alpha", mu=0, sd=100, shape = X.shape[1])
    w = pm.Normal("w", mu=0, sd=100, shape=X.shape[1])
    sigma = pm.HalfCauchy("sigma", beta=10)
    mu = tt.dot(w, tt.exp(-gamma * tt.square(tt.sub(X, x_i))).T)

    # Define likelihood
    likelihood = pm.Normal("y", mu=mu, sd=sigma, observed=y)

But this does not seem to be working. I may have misunderstood the idea of RBF kernel.

Any help to understand the correct implementations is greatly appreciated.

Thanks!!

junpenglao · May 30, 2018, 8:29am

Why not use a Gaussian process with the squared exponential cov function? The two is equivalent.

If you want to recreate the RBF regression, you can try rewriting the example from Edward into PyMC3: http://edwardlib.org/tutorials/supervised-classification with the rbf function.

Nadheesh · May 30, 2018, 10:05am

Thanks for the suggestion. I implemented that, but now I don’t understand how to make the predictions because the number of predictions are always equal to the number of samples used for training.

I used the same rbf function and this is the PyMC3 implementation.

with pm.Model() as model:
    cov = tf.cholesky(rbf(X.get_value().astype(np.float32)))
    sess = tf.Session()
    with sess.as_default():
        cov = cov.eval()
    z = pm.MvNormal("z", mu=0, cov=cov.astype(np.float64), shape=100)
    sigma = pm.HalfCauchy("sigma", beta=10)
 
    likelihood = pm.Normal("y", mu=z, sd=sigma, observed=y)

Can you please help me with figuring out how to make the predictions?

junpenglao · May 30, 2018, 10:52am

The rbf function takes lenghtscale parameter - which is what you are modeling in the beginning. You should not use it this way, my suggestion is to port the rbf function to theano first (again, it is equivalent as using the pm.gp module).

Nadheesh · May 30, 2018, 11:26am

I tried to use cov = pm.gp.cov.ExpQuad(1, ls=0.1).full(X) (to confirm whether porting to Theano will work). Moreover, ExpQuad seems the same as the rbf. Still the problem persists. Shouldn’t any covariance function from gp.cov produce the same number of predictions as the number of test/eval instances?

junpenglao · May 30, 2018, 11:43am

No, you need to conditioned one the test set to make prediction on the test set. In another word, you need to reevaluate the cov (and mean) function on the X_test to make prediction of y_test

Nadheesh · May 31, 2018, 3:15am

I don’t understand how to implement this. Can’t we directly use the GP module. It says that gp.Marginal can be used for regression with normally distributed data. Is not it similar to RBF?

I used this code to train the model as shown in the example :

with pm.Model() as model:
    # Specify the covariance function.
    cov_func = pm.gp.cov.ExpQuad(1, ls=0.1)

    # Specify the GP.  The default mean function is `Zero`.
    gp = pm.gp.Marginal(cov_func=cov_func)

    # Place a GP prior over the function f.
    sigma = pm.HalfCauchy("sigma", beta=3)
    y_ = gp.marginal_likelihood("y", X=X, y=y, noise=sigma)

Then used gp.predict(X_new) to predict. Can you please provide me some more direction?

Thanks.

junpenglao · May 31, 2018, 6:04am

What you are doing is correct - I am not sure I understand what you meant when you said it doesnt work? I guess you are working on a same ARD problem in your other post, maybe you can put up a notebook with some simulation data to show the things you have tried with the error and intended output?

Nadheesh · May 31, 2018, 10:28am

You are right, Sorry that I did not provide more information.

No this is a different problem. Check this gist.

gist.github.com

https://gist.github.com/nadheesh/99540bda6e70be6752ddb6c27391281d

rbf.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [

This file has been truncated. show original

I’m not sure whether I generated the data correctly, I tried to follow the sklearn examples for RBF kernel.

You can observe in the plot, that sklearn RBF kernel is perfectly aligned with the true y values (Note that I did not add noise to the test dataset), yet all the predictions from Bayesian implementations is around a single constant.

junpenglao · May 31, 2018, 9:28pm

The prediction you are doing is not yet conditioned on the inference, see http://docs.pymc.io/notebooks/GP-Marginal.html#Making-predictions

junpenglao · June 1, 2018, 5:47am

I am no completely sure what is the implementation behind the kernel regression in sklearn, but I manage to make your example work:

# bayesian rbf kernel
with pm.Model() as model:
    # hyper-prior for ls
    ls = pm.Beta("ls", alpha=1e-2, beta=1e-4)
    cov_func = pm.gp.cov.ExpQuad(X.shape[1], ls=ls)

    # Specify the GP.
    gp = pm.gp.Marginal(cov_func=cov_func)

    # Place a GP prior over the function f.
    sigma = pm.HalfNormal("sigma", 1)
    y_ = gp.marginal_likelihood("y", X=X, y=y, noise=sigma)
    # inference
    map1 = pm.find_MAP()
    pred = gp.predict(X_new, point=map1)

Nadheesh · June 1, 2018, 9:26am

Thanks a lot @junpenglao. That works now

Can you please tell me, when we use the mean functions with GP?

junpenglao · June 1, 2018, 9:30am

The mean function is used when you have some prediction of the overall trend of the data. I usually think of it this way: without mean function, the data goes back to zero slowly in the area there is no observation. And with a mean function (say a constant of K) the data goes to K slowly in the area where there is no observation.

Nadheesh · June 1, 2018, 9:49am

Ok, that make sense.

One last question. In GP, are we considering the noise or the covariance (or both) as the uncertainty of the predictions?

junpenglao · June 1, 2018, 10:02am

Both - but you can think of them as the uncertainty from different source. Notice that when you are working with Gaussian likelihood, the noise is the variance added to the diagonal of the cov matrix.

Nadheesh · June 1, 2018, 10:46am

Ok. Now I get it.
I really appreciate the help. Thanks a lot @junpenglao .

Nadheesh · June 4, 2018, 12:14pm

junpenglao:

# bayesian rbf kernel
with pm.Model() as model:
    # hyper-prior for ls
    ls = pm.Beta("ls", alpha=1e-2, beta=1e-4)
    cov_func = pm.gp.cov.ExpQuad(X.shape[1], ls=ls)

    # Specify the GP.
    gp = pm.gp.Marginal(cov_func=cov_func)

    # Place a GP prior over the function f.
    sigma = pm.HalfNormal("sigma", 1)
    y_ = gp.marginal_likelihood("y", X=X, y=y, noise=sigma)
    # inference
    map1 = pm.find_MAP()
    pred = gp.predict(X_new, point=map1)

When I used the model with some real-world dataset, although I used the same value with numpy random seed, I get 2 different errors (RMSE) for each run. Is it possible?

junpenglao · June 4, 2018, 12:59pm

I have not seen it before, maybe different runs reach to different local minimals?

Nadheesh · June 5, 2018, 2:53am

Is it possible when using the same random seed?

Topic		Replies	Views
Understanding the Predictive Covariance of Gaussian Processes Questions	1	444	June 23, 2018
Gaussian Processes: Sampling kernel hyperparameters? Questions	2	862	January 15, 2021
Implementing Gaussian Process with Periodic Kernel in PyMC: Questions and Approaches version agnostic development , gaussian_process , modeling	2	276	May 17, 2024
Implementing ARD Regression Questions	9	1258	May 30, 2018
Random forest kernel Questions	1	1292	July 16, 2017

Regression using RBF kernel

Related topics