Multidimensional gaussian process


I am trying to implement a scalar valued gaussian process model of an expensive function which has a multi-dimensional input. Once I have created this gaussian process, I would like to sample from this process to obtain distributions on my input parameters. I believe I am creating the function correctly:

ls = [1,1,1,1]
noise = 0.1
X = (4 by npts array)

with pm.Model() as gp_model:
    cov_func =, ls=ls)
    gp =
    f = gp.prior("f",X=X)
    y = pm.MvNormal('y',observed=errors,mu=f,cov=noise**tt.eye(npts),shape=npts)
    trace = pm.sample(5000, start=model.test_point, seed=1)

but when I try and sample using a collection of points I get an error

params = [1.0, -2.4, 3.6, 1.3]
with gp_model:
    fcond = gp.conditional("fcond",Xnew=params)

->TypeError: list indices must be integers, not tuple

Where am I going wrong?


Try using params = np.array([1.0, -2.4, 3.6, 1.3])


That seems to have solved part of my problem but now I’m seeing another error:

with gp_model:
    fcond = gp.conditional("fcond",Xnew=np.array(params))

IndexError: too many indices for array. Any thoughts about why this could be happening?

params = np.array([[1.0, -2.4, 3.6, 1.3]])

(one more pair of brackets)

which has 1 row and 4 columns. Also like I’d suggest a couple other things that might be helpful,

  • Use gp.Marginal, since the likelihood is MvNormal it’s conjugate to the GP. You’ll get a big speed up. If you wish to use gp.Latent with that likelihood, since your covariance for MvNormal is diagonal, you can use Normal instead which will be more efficient.
  • Set njobs=1 in the pm.sample(...) call. The matrix operations used by Theano here are multithreaded, so running multiple chains simultaneously bogs things down.
  • There should be no need to set start=model.test_point if everything is specified properly (should be gp_model.test_point I think).

NUTS uses all cores

Thanks for your help!

I’ve tried making a few changes. I will admit to be mostly in the dark when it comes to gaussian process models. Following the examples, I’ve rewritten my model as follows:

nvar = 4

with pm.Model() as model:
    ls    = pm.Gamma('l', shape=nvar, alpha=3.0, beta=5.0)
    eta   = pm.HalfCauchy('eta', shape=nvar, beta=5.0)
    sigma = pm.HalfNormal('sigma', 1.0)
    y_    = g.marginal_likelihood('y', X=X0.T, y=errors, noise = sigma)
    trace = pm.sample(1000,njobs=1)

That seems to work (though NUTS is bogging down in some chains). Is there any reason that could be happening? My first chain ran fairly fast (28.7 it/s) but my second is at around (1.7 it/s).


It would help using more informative priors. For example, instead of using a HalfCauchy distribution, try a HalfNormal.

I recommend reading Michael Betancourt’s blog on Gaussian Process, especially on informative priors.


That makes sense. Like I said, gaussian process models are new for me so I am feeling around in the dark with them. I will read that article and try to improve my priors.

Thanks for your help!


Okay, I have managed to get a fit to my gaussian process that I am comfortable with. I now want to sample from this gaussian process in another model.

I am varying four parameters which I then need to pass into the gaussian process. My new model is defined as follows:

with pm.Model() as model:
    a_ = pm.Normal('a',0,1)
    b_ = pm.Normal('b',0,1)
    c_ = pm.Normal('c',0,1)
    d_ = pm.Normal('d',0,1)

    sigma = pm.Normal('sigma',0.,1.)

    f_pred = gp.predict([[a_,b_,c_,d_]],point=mp)

    error = pm.Normal('error',mu=f,sigma=sigma,observed=0.)

which returns a TypeError. In general, I may want to define different distributions for the parameters. How do I combine these into a vector to pass to the gaussian process?


I think you might want to check out tt.stack, or maybe tt.concatenate. PyMC3 variables are really theano tensors, which has similar array operations/syntax as numpy.

import theano.tensor as tt

Also, watch out for predict. It is a convenience method and returns the mean and covariance as a numpy array. It’s handy if your last step of your model is to get the predictive/conditional mean and covariance. To continue modeling in PyMC3 you should use predictt (the extra t is for theano). It returns the covariance and mean as a symbolic variable. From your model though, I would guess that you really want to use conditional.

To follow up on @junpenglao’s comment on priors, since GPs are so flexible, using informative priors on the hyperparameters is a very good idea. The choice of covariance function and hyperparameters should be considered of equal importance when you’re designing a GP prior for your functions.


Thank you for the advice there. I tried tt.stack but wasn’t having much luck. I need to familiarize myself with theano’s functions more than I am.

I think I need to review gaussian processes more so that I can make better decisions regarding them. Most of what I am doing is to copy the examples without understanding which is not ideal. I will try your suggestions out and report back on how they go.


I am getting closer to a solution but I am still having some difficulties with syntax. I have the following definitions in python

nvar = 4
with pm.Model() as surrogate_model:
    ls  = pm.HalfNormal('l',2./3,shape=nvar)
    eta = pm.HalfNormal('sigma',1.0,shape=nvar)
    cov = eta**2.*,ls_inv=ls)

    gp =

    y_ = gp.marginal_likelihood('y',X=X0.T,y=errors,noise=sigma)

    mp = pm.find_MAP()

where X0 has a shape of (30,4). My intention is to create a gaussian process model which has a scalar output and four parameter input. This code executes without any difficulty.

Now I am trying to sample this gaussian process using random variables so I define

with surrogate_model:
    a_ = pm.Normal('a',params[0], 0.1*abs(params[0]))
    b_ = pm.Normal('b',params[1], 0.1*abs(params[1]))
    c_ = pm.Normal('c',params[2], 0.1*abs(params[2]))
    d_ = pm.Normal('d',params[3], 0.1*abs(params[3]))
    sigma2 = pm.HalfNormal('sigma2',1.0)

    params_ = tt.stack([a_,b_,c_,d_]).reshape([1,4])

    f_cond = gp.conditional("f_cond",Xnew=params_)

    error = pm.Normal('error',mu=f_cond,sigma=sigma2,observed=0.)

    trace = pm.sample(1000,njobs=1)

This code fails at the definition of f_cond where I get

ValueError: Input dimension mis-match. (input[0].shape[1] = 1, input[1].shape[1] = 4)

It seems like what may be happening is that I’m not defining my gp to have four inputs at all. However when I look at

[0 1 2 3]

which seems to show that I do have four degrees of freedom. I tried the following to see if I could reproduce my training data

with surrogate_model:
    f_pred = gp.conditional('f_pred',Xnew=X0.T)

with surrogate_model:
    pred_samples = pm.sample_ppc([mp],vars=[f_pred],samples=2000)

and I received the error

ValueError: incompatible dimensions

Any advice would be greatly appreciated.


I found the source of that error. I had made eta a 4d vector and it had to be a scalar for the multiplication to form the covariance function. Now I am receiving the following error:

with surrogate_model:
    f_pred = gp.conditional('f_pred',Xnew=X0)

TypeError: For compute_test_value, one input test value does not have the requested type.

The error when converting the test value to that variable type:
Wrong number of dimensions: expected 1, got 2 with shape (30,30).


Hmm, I think at this point, it would be good if you could post a reproducible example with your current code and some sample data, that can run and gives the error? I think we can get to the bottom of it then.

There are a couple inconsistencies I can see, like there’s no sigma defined in your first code block, but I’m not sure where the issue is exactly you see in the last line.


Sure thing. I’m sorry for the confusion in my code above. I was having to transcribe the code by hand (long story) so I’m sure I made mistakes.

I’ve attached a simple python script that demonstrates the error.

Thank you for taking a look. (1.9 KB)


Ah, found the issue. Change

errors = np.array([error_fxn(p,x,y) for p in X])[:,None]


errors = np.array([error_fxn(p,x,y) for p in X])

making the shape of errors (n,) instead of (n,1). This is briefly mentioned in the docstring for Marginal.marginal_likelihood. This could probably be more prominent.

One extra thing: Of course I don’t know your application, but are you sure you want to fit and predict the error directly, instead of y? I’m assuming you want to recover y_true, while you observe y?


Thanks for taking a look! The fact that it is a vector output threw me for a moment but now I see that it is (of course) because X is a vector of points. I’m obviously new at this…

I agree with you that for this simple case it would make much more sense to compare the curve to the observed data directly. I am using this approach since some colleagues of mine are related strategies for the application problem and I want to understand their method better.


Okay, hopefully this will be my last question in this chain. I am now trying to sample from the gaussian process via the following definition (immediately following my code attached previously)

with pm.Model() as surrogate_model:
    a_     = pm.Normal('a',params[0],0.1*abs(params[0]))
    b_     = pm.Normal('b',params[1],0.1*abs(params[1]))
    c_     = pm.Normal('c',params[2],0.1*abs(params[2]))
    d_     = pm.Normal('d',params[3],0.1*abs(params[3]))
    sigma2 = pm.HalfNormal('sigma2',1.)
    params_ = tt.stack([a_,b_,c_,d_]).reshape([1,4])
    e_pred = gp.conditional('e_pred',Xnew=params_,shape=(1,4))
    e = pm.Normal('e',mu=e_pred,sd=sigma2,observed=np.array([0.]))
    trace2 = pm.sample(1000,njobs=1)

but I get

ValueError: Input dimension mis-match. (input[0].shape[1] = 4, input[1].shape[1] = 1)

Note that I also have to specify the shape since the shape can’t be determined which makes me think that I am probably constructing params_ incorrectly.

When I sample from the posterior as in the attached python code, I have to pass in a nx4 vector where n is the number of points to estimate at. I have verified that I can pass in a single point in the following context and everything works fine.

Xtmp = np.reshape(X[0],[1,4])

with surrogate_model:
    f_pred = gp.conditional('f_pred',Xnew=Xtmp)
with surrogate_model:
    pred_samples = pm.sample_ppc(trace,vars=[f_pred],samples=2000)

Is it incorrect to try and sample from a gp in this way?


This should work:

with surrogate_model:
    params_ = pm.Normal('params_', params, 0.1 * tt.abs_(params), shape=(1, 4))
    sigma2 = pm.HalfNormal('sigma2', 1.)

    e_pred = gp.conditional('e_pred', Xnew=params_, shape=1)

    e = pm.Normal('e', mu=e_pred, sd=sigma2, observed=np.array([0.]))
    trace2 = pm.sample(1000,njobs=1)

But I am not sure about your use case, are you trying to do PPC with new input X from a Gaussian with parameters params? In that case, it would be easier if you generate a numpy arrary from Gaussian and pass to the conditional:

Xnew = np.random.randn(100, 4) * np.abs(params) * .1 + params

with surrogate_model:
    e_pred = gp.conditional('e_pred', Xnew=Xnew)
    pred_samples = pm.sample_ppc(trace, vars=[e_pred], samples=500)
sigma2 = pm.HalfNormal.dist(1.).random(size=(500, 1))
e = pm.Normal.dist(pred_samples['e_pred'], sigma2).random()


Thanks for taking a look! This solution does seem to get me over that error but in general I would like my distributions on each of my parameters to be different. How can I achieve that?

In terms of your solution, I’m still getting an error with the above solution though it is now:

ValueError                                Traceback (most recent call last)
<ipython-input-5-9e0c9073af7d> in <module>()
      7     e = pm.Normal('e', mu=e_pred, sd=sigma2, observed=np.array([0.]))
----> 8     trace2 = pm.sample(1000,njobs=1)

c:\users\natha\onedrive\documents\python3_modules\pymc3\pymc3\ in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, nuts_kwargs, step_kwargs, progressbar, model, random_seed, live_plot, discard_tuned_samples, live_plot_kwargs, compute_convergence_checks, use_mmap, **kwargs)
    393             start_, step = init_nuts(init=init, chains=chains, n_init=n_init,
    394                                      model=model, random_seed=random_seed,
--> 395                                      progressbar=progressbar, **args)
    396             if start is None:
    397                 start = start_

c:\users\natha\onedrive\documents\python3_modules\pymc3\pymc3\ in init_nuts(init, chains, n_init, model, random_seed, progressbar, **kwargs)
   1307         var = np.ones_like(mean)
   1308         potential = quadpotential.QuadPotentialDiagAdapt(
-> 1309             model.ndim, mean, var, 10)
   1310     elif init == 'advi+adapt_diag_grad':
   1311         approx =

c:\users\natha\onedrive\documents\python3_modules\pymc3\pymc3\step_methods\hmc\ in __init__(self, n, initial_mean, initial_diag, initial_weight, adaptation_window, dtype)
    119         if initial_diag is not None and len(initial_diag) != n:
    120             raise ValueError('Wrong shape for initial_diag: expected %s got %s'
--> 121                              % (n, len(initial_diag)))
    122         if len(initial_mean) != n:
    123             raise ValueError('Wrong shape for initial_mean: expected %s got %s'

ValueError: Wrong shape for initial_diag: expected 12 got 6

I’ve attached my python code using your solution for convenience. (2.0 KB)

In terms of what I want to do, I have a function which is expensive to evaluate and doesn’t have the features necessary to use NUTS or other more efficient samplers but I want to fit this function to data and understand the uncertainty on the parameters. My intention is to create a surrogate model which I can evaluate quickly and efficiency to get this uncertainty.

So my steps are:

  1. Generate potential model fits by varying my parameters over some range. This could be random sampling or populating a grid.
  2. Fit a gaussian process model to the error between my potential model fits and my data with my parameters as the input variables.
  3. Using this gaussian process model, sample my parameters from arbitrary distributions to reduce my uncertainty on their value.

Does that make sense?


Yeah you are not suppose to sample twice from the same model after you add new nodes, I am not too surprise that it doesnt work.

they are, each column is distributed differently in Xnew = np.random.randn(100, 4) * np.abs(params) * .1 + params

I think it should work.