2d Gaussian process IndexError

madarshahian · May 13, 2019, 9:18am

This is an example similar to an example for applying 2d gaussian process by @fonnesbeck. X_.shape=(10970, 2) and Y_.shape = (10970,). After running it I get this error at the line related to posterior predictive:

if samples.shape[0] == 1 and size == 1:

IndexError: tuple index out of range

Wondering if you could help me to resolve it?thanks!

#%%
X_shared=th.shared(X_)
Y_shared=th.shared(Y_)
#%%
nd = 30
z1, z2 = np.meshgrid(np.linspace(X_[:,0].min(), X_[:,0].max(), nd), np.linspace(X_[:,1].min(), X_[:,1].max(), nd))
Z = np.concatenate([z1.reshape(nd*nd, 1), z2.reshape(nd*nd, 1)], 1)

nd = 15
xu1, xu2 = np.meshgrid(np.linspace(X_[:,0].min(), X_[:,0].max(), nd), np.linspace(X_[:,1].min(), X_[:,1].max(), nd))
Xu = np.concatenate([xu1.reshape(nd*nd, 1), xu2.reshape(nd*nd, 1)], 1)
#%%
with pm.Model() as spatial_model:
    
    l = pm.HalfCauchy("l", beta=3, shape=(2,))
    sf2 = pm.HalfCauchy("sf2", beta=3)
    sn2 = pm.HalfCauchy("sn2", beta=3)

#    K = pm.gp.cov.ExpQuad(2, l) * sf2**2
    K = pm.gp.cov.Matern32(2,l) * sf2**2
    
    gp_spatial = pm.gp.MarginalSparse(cov_func=K, approx="FITC")
    obs = gp_spatial.marginal_likelihood("obs", X=X_shared, Xu=Xu, y=Y_shared, noise=sn2)

    trace = pm.sample(cores=1)
#%%

with spatial_model:
    f_pred = gp_spatial.conditional('f_pred2', Z)
    samples = pm.sample_posterior_predictive([trace], vars=[f_pred], samples=100)
#%%

madarshahian · May 15, 2019, 2:12pm

To add more to this question here is what I get after running the code:
1_pic

Gon_F · May 16, 2019, 8:25pm

You would really have to post where your data comes from so anyone that wants to help could actually run your code.

madarshahian · May 16, 2019, 10:06pm

@Gon_FThank you for the message, but the problem should not be the data, I explained the data shape in my explanation. To explain more, X_ , input is longitude and altitude, and output is a measurement for each of them. You can assume a grid that y is 2X_[:,0]+3X_[:,1]+noise. If it really matters I would be happy to share the actual data as well.

Gon_F · May 17, 2019, 5:20pm

I just mentioned it because it gives me peace of mind to know the error is not coming from the data : P , and it allows me to work along with the code and run it, instead of just abstractly analyzing it.

madarshahian · May 17, 2019, 6:23pm

data.csv (1.5 MB)
@Gon_F here is the data, there are some missing values and I used (pandas dropna) to discard them.
Thanks!

madarshahian · May 20, 2019, 3:19pm

I am wondering if the code looks good? or there is an error in it that I couldn’t fine? What is the reason for the error?

fonnesbeck · May 23, 2019, 1:18am

The model does run on my system, using current master (macOS and Python 3.7.3). Can you try that (or the release candidate that was just pushed)?

madarshahian · June 3, 2019, 1:36pm

Hey,
Sorry I missed your message, and thanks so much for your attention; after seeing it I updated my pymc3. For some reason I couldn’t use the developer version using git clone. So I have just updated it to PyMC3 version v.3.7 from v.3.6.
The code now works, but I still have two problems after this update:
1- this warning still is there, and I do not know how to solve it:
`

C:\Users\xxxxx\AppData\Local\Continuum\anaconda3\envs\pymc3env\lib\site-packages\theano\tensor\basic.py:6611: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use arr[tuple(seq)] instead of arr[seq]. In the future this will be interpreted as an array index, arr[np.array(seq)], which will result either in an error or a different result.

`
2-

File “C:\Users\xxxxx\AppData\Local\Continuum\anaconda3\envs\pymc3env\lib\site-packages\pymc3\plots_init_.py”, line 22, in call
“ArviZ is not installed. In order to use {0.attr}:\npip install arviz”.format(self)

ImportError: ArviZ is not installed. In order to use plot_forest:

This Arviz error is really annoying! Last time when this happened I had to re install whole Anaconda. Even when I install it, and installation is done successfully, it keep giving me this error.

Gon_F · June 3, 2019, 6:57pm

So, using the following code on Pymc3 3.7, everything works out without any errors if I run the code on a subset of the first 400 points in the dataset.

Apologies for taking so long to reply: I kept trying to run the whole model but even on my fast computer, it kept taking 20+ hrs., and my python session would quit before that based on some settings I could not change.

If the code below also works out for you, I imagine the sheer size of your dataset stretches the GP methods to their limits, and whatever breaks first is the error you end up getting.

Data import + pre-preprocessing:

data_discourse = pd.read_csv('data_discourse.csv')

np.shape(data_discourse)

data_discourse.head(10)

data_discourse.dropna(inplace=True)

data_discourse.reset_index(inplace=True, drop=True)

np.shape(data_discourse)

data_discourse.head(10)

X_ = data_discourse.iloc[:, :2].values
Y_ = data_discourse.iloc[:, 2].values

X_[:5]
Y_[:5]

Data preprocessing:

import theano as th
import numpy as np

#%%
X_shared=th.shared(X_)
Y_shared=th.shared(Y_)

nd = 30
z1, z2 = np.meshgrid(np.linspace(X_[:,0].min(), X_[:,0].max(), nd), np.linspace(X_[:,1].min(), X_[:,1].max(), nd))
Z = np.concatenate([z1.reshape(nd*nd, 1), z2.reshape(nd*nd, 1)], 1)

nd = 15
xu1, xu2 = np.meshgrid(np.linspace(X_[:,0].min(), X_[:,0].max(), nd), np.linspace(X_[:,1].min(), X_[:,1].max(), nd))
Xu = np.concatenate([xu1.reshape(nd*nd, 1), xu2.reshape(nd*nd, 1)], 1)

Model:

with pm.Model() as spatial_model:
    
    l = pm.HalfCauchy("l", beta=3, shape=(2,))
    sf2 = pm.HalfCauchy("sf2", beta=3)
    sn2 = pm.HalfCauchy("sn2", beta=3)

#    K = pm.gp.cov.ExpQuad(2, l) * sf2**2
    K = pm.gp.cov.Matern32(2,l) * sf2**2
    
    gp_spatial = pm.gp.MarginalSparse(cov_func=K, approx="FITC")
    obs = gp_spatial.marginal_likelihood("obs", X=X_shared[:200], Xu=Xu, y=Y_shared[:200], noise=sn2)

    trace = pm.sample(cores=2, chains=2)

with spatial_model:
    f_pred = gp_spatial.conditional('f_pred2', Z)
    samples = pm.sample_posterior_predictive([trace], vars=[f_pred], samples=100)

madarshahian · June 4, 2019, 3:42pm

Thanks so much. Actually it worked for me after updating pymc3. One of the things I need to do is to manually write a code to generate posterior predictive for this model. Since at the end of the day I will pass this to people who don’t have pymc3. For simple models like Multiple regression, this is easy for me to construct, but for this model is more confusing. Specially I see instead of changing x_shared, we used “gp_spatial.conditional('f_pred2' , Z)”. I am wondering if you can advice me how to write posterior predictive using my posterior for this model? Is there any Layman reference? ~Thanks!

madarshahian · June 7, 2019, 1:50pm

To continue the discussion above, the parameters are sn2, sf2, l. Can I claim “sn2” is something like standard deviation of residuals? and how can I make posterior predictive (without using pymc3) if I have posterior samples for sn2, sf2, l from pymc3? Is there any easy read reference?

Gon_F · June 7, 2019, 4:55pm

I am confused by what you mean just now, but if you want to write A) an analytical expression for your posterior, that would be difficult, and I would hopefully direct you to Rasmussen’s GP book, or B) create predictions without Pymc3, after gaining sampled chains from Pymc3, that would be recreating a complex component manually, which imho would not be easy, and I have no good reference for that.

The best I would hope is maybe one of the developers can chime in on what posterior predicting based on new data precisely does in Pymc3.

madarshahian · June 14, 2019, 8:56pm

Thanks! You got it. So I want to use chains I have for sn2 and sf2 and l to generate a posterior predictive if someone ask me what is the function value for new x1 and x2. However, I do not want to use pymc3 anymore, I just want to get chains for sn2 and sf2 and l from pymc3 and then write a code to use them to generate posterior predictive. This is easy for regression models, but I am not sure how it gonna be exactly for Gaussian process.

Topic		Replies	Views
Multidimensional gaussian process Questions	55	10107	October 22, 2020
IndexError: tuple index out of range error from pm.sample v5 gaussian_process , modeling , sampling	11	1250	February 16, 2023
Error IndexError: index out of bounds Questions	2	679	March 31, 2021
IndexError: tuple index out of range v3	4	824	June 16, 2022
Either "too many indices for array" or "array must not contain infs or NaNs" error with a GP Questions	4	2931	May 18, 2019

2d Gaussian process IndexError

Related topics