Regression using RBF kernel

I’m trying to implement a RBF kernel for regression (similar to Kernel used in sklearn SVR). I want to capture non-linear relationships between the predictors and dependent variable.

gamma = 1
with pm.Model() as model:
    x_i = pm.Normal("alpha", mu=0, sd=100, shape = X.shape[1])
    w = pm.Normal("w", mu=0, sd=100, shape=X.shape[1])
    sigma = pm.HalfCauchy("sigma", beta=10)
    mu =, tt.exp(-gamma * tt.square(tt.sub(X, x_i))).T)

    # Define likelihood
    likelihood = pm.Normal("y", mu=mu, sd=sigma, observed=y)

But this does not seem to be working. I may have misunderstood the idea of RBF kernel.

Any help to understand the correct implementations is greatly appreciated.


Why not use a Gaussian process with the squared exponential cov function? The two is equivalent.

If you want to recreate the RBF regression, you can try rewriting the example from Edward into PyMC3: with the rbf function.

1 Like

Thanks for the suggestion. I implemented that, but now I don’t understand how to make the predictions because the number of predictions are always equal to the number of samples used for training.

I used the same rbf function and this is the PyMC3 implementation.

with pm.Model() as model:
    cov = tf.cholesky(rbf(X.get_value().astype(np.float32)))
    sess = tf.Session()
    with sess.as_default():
        cov = cov.eval()
    z = pm.MvNormal("z", mu=0, cov=cov.astype(np.float64), shape=100)
    sigma = pm.HalfCauchy("sigma", beta=10)
    likelihood = pm.Normal("y", mu=z, sd=sigma, observed=y)

Can you please help me with figuring out how to make the predictions?

The rbf function takes lenghtscale parameter - which is what you are modeling in the beginning. You should not use it this way, my suggestion is to port the rbf function to theano first (again, it is equivalent as using the module).

1 Like

I tried to use cov =, ls=0.1).full(X) (to confirm whether porting to Theano will work). Moreover, ExpQuad seems the same as the rbf. Still the problem persists. Shouldn’t any covariance function from gp.cov produce the same number of predictions as the number of test/eval instances?

No, you need to conditioned one the test set to make prediction on the test set. In another word, you need to reevaluate the cov (and mean) function on the X_test to make prediction of y_test

1 Like

I don’t understand how to implement this. Can’t we directly use the GP module. It says that gp.Marginal can be used for regression with normally distributed data. Is not it similar to RBF?

I used this code to train the model as shown in the example :

with pm.Model() as model:
    # Specify the covariance function.
    cov_func =, ls=0.1)

    # Specify the GP.  The default mean function is `Zero`.
    gp =

    # Place a GP prior over the function f.
    sigma = pm.HalfCauchy("sigma", beta=3)
    y_ = gp.marginal_likelihood("y", X=X, y=y, noise=sigma)

Then used gp.predict(X_new) to predict. Can you please provide me some more direction?


What you are doing is correct - I am not sure I understand what you meant when you said it doesnt work? I guess you are working on a same ARD problem in your other post, maybe you can put up a notebook with some simulation data to show the things you have tried with the error and intended output?

You are right, Sorry that I did not provide more information.

No this is a different problem. Check this gist.

I’m not sure whether I generated the data correctly, I tried to follow the sklearn examples for RBF kernel.

You can observe in the plot, that sklearn RBF kernel is perfectly aligned with the true y values (Note that I did not add noise to the test dataset), yet all the predictions from Bayesian implementations is around a single constant.

The prediction you are doing is not yet conditioned on the inference, see

1 Like

I am no completely sure what is the implementation behind the kernel regression in sklearn, but I manage to make your example work:

# bayesian rbf kernel
with pm.Model() as model:
    # hyper-prior for ls
    ls = pm.Beta("ls", alpha=1e-2, beta=1e-4)
    cov_func =[1], ls=ls)

    # Specify the GP.
    gp =

    # Place a GP prior over the function f.
    sigma = pm.HalfNormal("sigma", 1)
    y_ = gp.marginal_likelihood("y", X=X, y=y, noise=sigma)
    # inference
    map1 = pm.find_MAP()
    pred = gp.predict(X_new, point=map1)
1 Like

Thanks a lot @junpenglao. That works now :smiley: :smiley:

Can you please tell me, when we use the mean functions with GP?

The mean function is used when you have some prediction of the overall trend of the data. I usually think of it this way: without mean function, the data goes back to zero slowly in the area there is no observation. And with a mean function (say a constant of K) the data goes to K slowly in the area where there is no observation.

1 Like

Ok, that make sense.

One last question. In GP, are we considering the noise or the covariance (or both) as the uncertainty of the predictions?

Both - but you can think of them as the uncertainty from different source. Notice that when you are working with Gaussian likelihood, the noise is the variance added to the diagonal of the cov matrix.

1 Like

Ok. Now I get it.
I really appreciate the help. Thanks a lot @junpenglao .

1 Like

When I used the model with some real-world dataset, although I used the same value with numpy random seed, I get 2 different errors (RMSE) for each run. Is it possible?

I have not seen it before, maybe different runs reach to different local minimals?

Is it possible when using the same random seed?