The predict function of GP

Hi, I encountered some problems when building a Gaussian proxy model, I’m sorry it seems silly but I still cannot solve it.
My training data X is 88×2 matrix, Y is 88×400 matrix.

def main(argv=None):
    with pm.Model() as gp_model:
        ls = pm.Normal('ls',mu = 0.2, sd = 1)
        cov =, ls = ls)
        gp =
        eps = pm.Uniform('eps', -1, 1)
        y_pred = gp.marginal_likelihood('y_pred', X = X, y = Y.t(), noise = eps)
        trace = pm.sample(5000)
    point = {'ls': np.mean(trace['ls'], axis=0), 'eps': trace['eps'].mean()}
    Z = np.array([[13.83, 0.43]])
    mu,var = gp.predict(Z, point=point)
    print(mu, var)

if __name__=='__main__':

The ValueError: shapes of a (88, 88) and b (400, 88) are incompatible.
Hope someone can give me some suggestions. Thanks in advance.

Hi, can you include the full error message and versions of pymc3 and theano you are using? It’s much more probable we’ll be able to help if you do so.

Note that you can use markdown to format your post, i.e. three backticks ` for code blocks so that indentation is preserved, > is intended for highlight/quote blocks instead. And in addition, Discourse also has math support and some special tags for things like quote responding or creating dropdowns. If the error message is too long and would clutter the post, use a dropdown to make it available yet hidden by default. You can do so from the gear icon, the rightmost one or directly writing:

Error trance here, formatted as a code block.

Thanks for your prompt reply.
The full error message is:

Traceback (most recent call last):

  File "C:\Users\xiaoyun\.spyder-pytorch\", line 85, in <module>

  File "C:\Users\xiaoyun\.spyder-pytorch\", line 75, in main
    mu,var = gp.predict(Z, point=point)

  File "C:\Anaconda3\lib\site-packages\pymc3\gp\", line 556, in predict
    mu, cov = self.predictt(Xnew, diag, pred_noise, given)

  File "C:\Anaconda3\lib\site-packages\pymc3\gp\", line 579, in predictt
    mu, cov = self._build_conditional(Xnew, pred_noise, diag, *givens)

  File "C:\Anaconda3\lib\site-packages\pymc3\gp\", line 478, in _build_conditional
    v = solve_lower(L, rxx)

  File "C:\Anaconda3\lib\site-packages\theano\graph\", line 253, in __call__

  File "C:\Anaconda3\lib\site-packages\theano\graph\", line 130, in compute_test_value
    required = thunk()

  File "C:\Anaconda3\lib\site-packages\theano\graph\", line 476, in rval
    r = p(n, [x[0] for x in i], o)

  File "C:\Anaconda3\lib\site-packages\theano\tensor\", line 251, in perform
    rval = scipy.linalg.solve_triangular(A, b, lower=True)

  File "C:\Anaconda3\lib\site-packages\scipy\linalg\", line 338, in solve_triangular
    raise ValueError('shapes of a {} and b {} are incompatible'

ValueError: shapes of a (88, 88) and b (400, 88) are incompatible

The version of pymc3 and the theano-pymc are 3.11.2 and 1.1.2.
Thanks again.

I think the issue here is that Y has multiple observations. Unfortunately this isn’t supported in PyMC3. If you check the docstring for gp.Marginal.marginal_likelihood, it says:

y: array-like
Data that is the sum of the function with the GP prior and Gaussian noise. Must have shape (n, ).


Thank you very much for your answer, this is really bad news, hope to see the upgrade in PyMC4.

1 Like

I think you can still make it work, you might just need to shape your data differently. Judging by the shape of your matrices, it looks like you have 88 conditions, which differ by the 2D X coordinate, and 400 observations for each condition. What are these observations?

If these observations are something continuous like a time series or spectra, then you just need to add that as an extra dimension to your model. Let’s call that dimension t. Your X matrix becomes shape (35200, 3), where X[:,0] is X_0, X[:,1] is X_1, and X[:,2] is t, and your Y matrix becomes shape (35200,). Note you’ll also need to include this additional dimension in your covariance structure.

If these observations are realizations of some random variable, and the GP represents some latent function, just reshape your X to shape (35200,2) and your Y to shape (35200,). You end up repeating the same coordinate pair over and over again for repeated observations, but that’s fine. If your random variable likelihood is Gaussian, you can stick with a marginal GP like you have there. If your RV is non-Guassian, you have to use a latent GP, but otherwise the structure is the same and there are several examples in the docs about how to do that.

1 Like

I understand what you mean. This is indeed a way to use quantity to overcome high-dimensional output.
The input data is a variable similar to wind speed, and the output data is the wind pressure value of 400 measuring points on a building. This means that when determining the two input values, the wind pressure values of the 400 measuring points must be obtained at the same time.
According to your suggestion, it is equivalent to I established a gp model for each measuring point, a total of 400 gp models. But after I build the gp model, I will use it on MCMC to calculate the posterior distribution of the input variables in the gp model. This means that every step of MCMC requires the output results of the 400 gp models established before. But this is indeed a method, I think I might try it, thank you very much for your suggestions.

1 Like

Agree with @BioGoertz, but unfortunately those approaches end up being pretty slow (but often that’s all you can do).

This has been bugging me for a while, and there isn’t really anything tricky about the implementation, so here’s an example of multioutput GPs for Marginal, hopefully that helps you out in this case. I def agree that it would be great if PyMC4 had this functionality built-in.


Yeah, definitely look into the Sparse GP implementations, they should help a lot with speed.

According to your suggestion, it is equivalent to I established a gp model for each measuring point, a total of 400 gp models.

Only sort of. My suggestion is to go one of two directions, depending on whether you care about each sensor individually (that is, you are actually interested in comparing how sensor 1 behaves at wind speed [13.83, 0.43] vs [10.8, 0.23], and separately how sensor 2 behaves at various wind speeds) or if you only care about them collectively, i.e. you just want to know the mean and standard deviation of the sensor readings. The second case is easier, and seems most similar to your model. This is really just one GP, not 400, but with repeated observations:

# Tile and reshape your input matrix, (88,2) -> (35200,2)
X_tiled = np.tile(X,[1,1,2]).reshape([-1,2]) 
# Transform and flatten observation matrix, (88,400) -> (35200,)
# Note that if your observations are strictly positive, you should probably 
# transform them from (0, +inf) to (-inf, +inf) via a log link. Of course, 
# if an observation of 0 is possible, a different transformation function 
# or a different likelihood model (via a `Latent` GP) would be better.
lg_y_flat = np.log(y.flatten()) 

with pm.Model() as gp_model:
        ls = pm.Normal('ls',mu = 0.2, sd = 1)
        # Is there a reason you didn't include scaling?
        η= pm.Gamma('η', alpha=2, beta=1)
        cov = η**2 *, ls = ls)
        gp =
        eps = pm.Exponential('eps', 1)  # Noise should be strictly positive
        y_pred = gp.marginal_likelihood('y_pred', X = X_tiled, y = lg_y_flat, noise = eps)
        # I'd strongly recommend starting with the MAP, at least for prototyping the model
        mp = pm.find_MAP()

Alternatively, if you do care about individual sensor behavior, I’d encode the x/y/z position of each sensor in the input data. That means you now have 5 input dimensions (two related to wind speed and three related to position) which will allow the model to capture the spatial correlations between sensor readings. You may want to different covariance kernels for the spatial vs wind speed inputs.

Additional modeling tips

A couple extra thoughts come to mind.

  1. You say X is related to wind speed, might I hazard a guess that these are vectors of speed and angle? If that’s the case, you may want to consider a Circular Kernel that will likely capture angular continuity better.

  2. Particularly at high dimensions, it will be very important to have reasonable priors on the hyperparameters. I’d check out Michael Betancourt’s blog post on the topic and my answer here for how I implement his suggestions in python.


Hi, thank you for your very sincere help, and I spent several days trying to understand your suggestion in my case for I’m a novice.

##Tile and reshape your input matrix, (88,2) → (35200,2)
X_tiled = np.tile(X,[1,1,2]).reshape([-1,2])

I think it should be:

X_tiled = np.tile(X,[1,1,400]).reshape([-1,2])

Hope I get what you mean. And when I use the predict function, enter two parameters, I will get one output value.

What I want to say is, my previous description of the observations may be a bit vague, I originally thought that entering two values can get 400 values, like the Gaussian process model in sklearn object, because these 400 values are in order, they are not randomly sample from a certain distribution, each has different sequence and position. (I don’t know if I have expressed it clearly). So I hope to achieve the effect of establishing a gp model for each measurement point.

All in all, thank you very, very much, I think I might find a way to process my training data, and use the method you suggested to achieve my goal.

1 Like

Great, good luck! Good catch on the typo in my code. I’m not sure how sklearn could be implemented to achieve the behavior you described… But yeah, I think you need to treat that “order” of your 400 values as an extra input dimension (or three).