A very simple GP fails, giving not positive-semidefinite covariance and nan prediction

Jan · September 25, 2018, 6:52pm

When sampling from posterior, it gives following warning:

with model:
    f_pred = gp.conditional("f_pred", Xnew=ts[:,None])
    ppc = pm.sample_ppc(trace, vars=[f_pred], samples=100)

RuntimeWarning: covariance is not positive-semidefinite.
  out = random_state.multivariate_normal(mean, cov, size)

IMO, the covariance has no way to be non PSD, since it is solely computed by built-in cov functions.
And the prediction of test data has a lot nan.

ppc['f_pred']

See the model first, followed by the data.

with pm.Model() as model:
    # Kernel
    c = pm.Normal('c', mu=0, sd=10)
    K_c = pm.gp.cov.Constant(c)
    c2 = pm.Normal('c2', mu=0, sd=10)
    nu2 = pm.HalfNormal('nu2', sd=10)
    K_lin = nu2 * pm.gp.cov.Linear(1, c2)
    K = K_c + K_lin
  
    # GP likelihood
    mu = pm.gp.mean.Zero()
    gp = pm.gp.Marginal(mean_func=mu, cov_func=K)
    sigma = pm.HalfCauchy('sigma', 2.5)
    y_obs = gp.marginal_likelihood("y_obs", X=t.reshape((-1,1)), y=y, noise=sigma)
    
    # Sample
    step = pm.Metropolis(vars=model.free_RVs)
    trace = pm.sample(500, step=step)

with model:
    f_pred = gp.conditional("f_pred", Xnew=ts[:,None])
    ppc = pm.sample_ppc(trace, vars=[f_pred], samples=100)

Data:

t = np.array([  12,   27,   42,   57,   72,   87,  102,  117,  132,  147,  162,
        177,  192,  207,  222,  237,  253,  257,  268,  283,  298,  313,
        328,  343,  358,  373,  388,  403,  418,  434,  445,  449,  464,
        479,  494,  510,  512,  525,  539,  540,  555,  570,  582,  585,
        600,  615,  624,  630,  645,  660,  676,  689,  691,  706,  721,
        722,  736,  751,  762,  766,  782,  788,  797,  812,  821,  827,
        842,  857,  872,  877,  888,  900,  902,  908,  917,  932,  948,
        953,  963,  978,  981,  993, 1001, 1009, 1018, 1024, 1039, 1054,
       1069, 1084, 1099, 1114, 1130, 1131, 1145, 1160, 1175, 1178, 1189,
       1193, 1204, 1205, 1220, 1230, 1235, 1250, 1254, 1265, 1277, 1280,
       1299, 1307, 1314, 1329, 1344, 1359, 1374, 1389, 1404, 1419, 1434,
       1449, 1464, 1479, 1494, 1509, 1524, 1539, 1554, 1569, 1584, 1599,
       1614, 1629, 1644, 1659, 1674, 1689, 1705, 1710, 1720, 1735, 1750,
       1765, 1780, 1795, 1808, 1810, 1819, 1825, 1840, 1847, 1855, 1870,
       1886, 1892, 1901, 1916, 1931, 1941, 1946, 1952, 1961, 1976, 1991,
       2006, 2021, 2026, 2037, 2041, 2052, 2067, 2082, 2097, 2113, 2119,
       2127, 2138, 2142, 2158, 2163, 2173, 2180, 2188, 2203, 2218, 2225,
       2233, 2248, 2263, 2278, 2294, 2298, 2309, 2324, 2339, 2355, 2370,
       2384, 2388, 2399, 2415, 2417, 2430, 2444, 2453, 2459, 2476, 2478,
       2491, 2505, 2506, 2521, 2534, 2536, 2551, 2566, 2581, 2597, 2599,
       2612, 2627, 2642, 2656, 2660, 2672, 2684, 2687, 2696, 2705, 2706,
       2720, 2735, 2750, 2765, 2780, 2795, 2810, 2825, 2840, 2855, 2870,
       2885, 2900, 2915, 2930, 2945, 2960, 2975, 2990, 3005, 3020, 3035,
       3050, 3065, 3080, 3095, 3110, 3125, 3140, 3155, 3170, 3172, 3185,
       3200, 3217, 3229, 3232, 3247, 3262, 3277, 3292, 3307, 3314, 3322,
       3324, 3333, 3338, 3346, 3353, 3368, 3383, 3398, 3410, 3413, 3428,
       3443, 3458, 3468, 3473, 3484, 3488, 3503, 3517, 3518, 3534, 3536,
       3549, 3564, 3573, 3579, 3594, 3609, 3625, 3633, 3640, 3655, 3670,
       3685, 3700, 3715, 3730, 3732, 3746, 3750, 3761, 3776, 3791, 3806,
       3821, 3836, 3851, 3866, 3882, 3895, 3897, 3912, 3927, 3942, 3957,
       3972, 3987, 4002, 4018, 4025, 4033, 4047, 4052, 4062, 4064, 4078,
       4079, 4093, 4100, 4108, 4123, 4138, 4141, 4156, 4157, 4172, 4187,
       4202, 4217, 4232, 4247, 4262, 4277, 4292, 4307])
y = np.array([5.2, 5.4, 5.6, 5.4, 5.5, 5.8, 5.8, 5.6, 5.6, 5.6, 5.3, 5.1, 5.2,
       5.3, 5.4, 5.5, 5.6, 6.4, 5.2, 5.2, 5.6, 5.8, 5.7, 5.7, 5.5, 5.2,
       5.2, 5.4, 5.4, 5.8, 6.7, 6.1, 6.2, 6.2, 6.2, 6.2, 6.1, 6. , 5.9,
       6.1, 6.4, 7.1, 7. , 6.9, 6.4, 5.8, 5.6, 5.4, 5.4, 5.4, 5.4, 5.5,
       5.2, 5.1, 5.2, 5.1, 5.2, 5.3, 5.3, 5.5, 5.5, 5.4, 5.4, 5.8, 6.2,
       5.5, 5. , 4.9, 5. , 5.1, 5.3, 6.3, 5.8, 5.9, 5.3, 4.9, 4.9, 4.9,
       5.3, 5.7, 6.4, 5.3, 4.9, 5.3, 5.4, 5.1, 5.1, 5.2, 5. , 4.7, 4.6,
       4.7, 4.9, 4.8, 4.9, 4.9, 5. , 5.1, 5. , 5.3, 5.2, 5.6, 6.3, 6.4,
       6.3, 5.9, 5.8, 5.8, 5.6, 5.6, 5.3, 5.2, 5.5, 5.7, 5.7, 5.4, 5.3,
       5.4, 5.6, 5.8, 5.8, 5.8, 5.6, 5.4, 5.3, 5.3, 5.2, 5.2, 5.3, 5.3,
       5.2, 4.9, 4.7, 4.7, 4.8, 5.3, 5.6, 5.7, 5.4, 5.8, 5.2, 5.6, 5.6,
       5.5, 5.7, 5.4, 4.9, 5.5, 5.7, 5.6, 5.4, 5.4, 5.5, 5.5, 5.7, 5.8,
       5.9, 6.2, 6.3, 6.3, 5.9, 6.1, 5.3, 5.3, 5.3, 5.3, 5.4, 5.4, 5.4,
       5.4, 5.4, 5.3, 5.3, 5.2, 5.2, 5.3, 5.3, 5.3, 5.3, 5.5, 5.6, 6.3,
       6.6, 6.8, 6.8, 6.2, 6.1, 5.8, 5.6, 5.4, 5.4, 5.4, 5.2, 5.3, 5.3,
       5.3, 5.2, 5.3, 6. , 5.9, 7.1, 7.3, 7.3, 6.7, 6.2, 5.9, 6.3, 5.9,
       5.6, 5.7, 6. , 5.9, 6.2, 5.8, 6.3, 5.8, 5.1, 4.8, 4.8, 5.3, 4.6,
       5.1, 5.6, 5.2, 5.1, 5.1, 4.7, 5.2, 4.9, 5.3, 5.5, 5.4, 5.7, 5.6,
       5.5, 5.5, 5.6, 5.8, 5.9, 5.9, 5.6, 5.4, 5.5, 5.6, 5.6, 5.5, 5.1,
       4.8, 4.8, 5. , 5.2, 5.3, 5.3, 5.2, 5.3, 5.6, 5.8, 5.8, 5.7, 5.7,
       5.9, 5.7, 5.6, 5.7, 5.8, 5.6, 5.8, 5.8, 5.9, 6.1, 6.3, 6.3, 6.2,
       6.3, 6.2, 6.2, 6.3, 6.7, 7.1, 6.7, 5.9, 5.4, 5.6, 5.5, 5.6, 5.4,
       5.5, 5.7, 6.2, 5.2, 5.3, 5.3, 5.3, 5.3, 5.2, 5.3, 5.2, 5.3, 5.3,
       5.2, 5.3, 5.2, 6.1, 7.1, 7.4, 7.3, 6.8, 6.3, 5.9, 5.5, 5.3, 5.2,
       5.2, 5.1, 4.9, 5.2, 5.1, 4.8, 4.8, 5.3, 6.4, 7.6, 6.8, 6.1, 6.1,
       5.9, 5.8, 5.7, 5.6, 5.4, 5.3, 5.3, 5.3, 5.1, 5.1, 4.9, 5.1, 5.1,
       5.3, 5.2, 5.9, 5.9, 6.8, 6.7, 6.6, 5.6, 5.9, 6.1, 6.3, 5.9, 5.9,
       6.3, 6.5, 6.4, 6.4, 6.6, 6.9, 6.6, 6. , 5.7])
#ts = np.linspace(0, t[-1], 100)
ts = t

junpenglao · September 25, 2018, 7:53pm

Why do you sample with Metropolis? You should use the default.

There are many reason why the cov is not positive semidefinite, especially when the model is badly fitted, which resulting numerical error when you are trying to inverse the resulting cov.

Jan · September 25, 2018, 7:56pm

This is a simplified model from my project, where I have to try Metropolis first because NUTS gets stuck easily.
Would you mind elaborating a little more on the numerical error? I have no idea how to make this work.

Jan · September 25, 2018, 7:59pm

Even when parameters are badly fitted, they are still valid parameters for the Cov function. I suppose a valid Cov would always produce invertible covariance matrix.
Please correct me if I am wrong.

junpenglao · September 25, 2018, 8:28pm

If NUTS gets stuck easily, then this is a strong indication of problem in your model. Metropolis wont solve your model problem, just make it hidden - but you will see problem down stream, like the positive definite error you are seeing now. Likely Metropolis is not providing valid sample. Did you look at the trace?

Jan · September 26, 2018, 8:12am

All right. I switched to NUTS, which gives more or less similar samples as Metropolis, and this still happens.
It takes about 7min for NUTS to sample, and the convergence for both Metropolis and NUTS seem pretty good, in terms of the Rhat and divergence.

Actually, I suspect that the error comes from the Kernel function instead of the samples.

rohanchauhan · September 26, 2018, 2:29pm

Hi, if you are getting Nan values, it might be because the priors are not right. Try setting Lognormal or Halfnormal prior on length scale (c2) because it is always going to be positive. Your covariance matrix might not be positive semidefinite due to these reasons or possibly doesn’t remain so after combining them.

Jan · September 26, 2018, 2:43pm

Thanks for your visit.
c2 is just a parameter for a linear kernel, which determines the x-coordinate of the point that all the lines in the posterior go though, so I don’t see the reason why it has to be positive.

Now I investigated the problem a bit more.
The problem is indeed caused by the non-PD covariance matrix, but numpy.random.multivariate_normal can sample from it successfully, while stats.multivariate_normal used by PyMC3 fails.

@junpenglao Would you think this be of interest to PyMC3 dev?

The way I found out is by printing ss in the pymc3/distributions/multivariate.py:MvNormal.
I am not an expert in PyMC3 so my way of debugging might a bit awkward.

rohanchauhan · September 26, 2018, 2:48pm

Right, c2 is not lengthscale, I misread. But usually, that’s the reason for Nan. (from my past experience) Below works:

with pm.Model() as model:
# Kernel
c = pm.Lognormal('c', mu=0, sd=10)
K_c = pm.gp.cov.Constant(c)
c2 = pm.Lognormal('c2', mu=0, sd=10)
nu2 = pm.HalfNormal('nu2', sd=10)
K_lin = nu2 * pm.gp.cov.Linear(1, c2)
K = K_c + K_lin

# GP likelihood
mu = pm.gp.mean.Zero()
gp = pm.gp.Marginal(mean_func=mu, cov_func=K)
sigma = pm.HalfCauchy('sigma', 2.5)
y_obs = gp.marginal_likelihood("y_obs", X=t.reshape((-1,1)), y=y, noise=sigma)

# Sample
mp = pm.find_MAP()        
print(sorted([name+":"+str(mp[name]) for name in mp.keys() if not name.endswith("_")]))
f_pred = gp.conditional("f_pred", Xnew=ts[:,None])
ppc = pm.sample_ppc([mp], vars=[f_pred], samples=100)

fig = plt.figure(figsize=(17,10)); ax = fig.gca()
plot_gp_dist(ax, ppc['f_pred'], ts);

Used MAP because sampling was too slow.

Jan · September 26, 2018, 5:47pm

Hi Rohan,

Thanks for your help!
However, I tried MAP by myself already, and it worked on the original one as well.
I avoid using MAP since it is biased in the case of high dimension.

Edited:
Sorry, I skipped the last sentence. I will give it a try. Thanks!

Jan · September 27, 2018, 10:03am

Unfortunately, it does not work.

I also tried to increase the magnitude of the noise (from 1e-5 to 10), which also fails.

Jan · September 27, 2018, 11:18am

Finally I get this done by adding a larger diagonal noise.

In source code: pymc3/gp:Marginal:_build_conditional():

#return mu, cov if pred_noise else stabilize(cov)
return mu, cov if pred_noise else cov + 1e-4 * tt.identity_like(cov)

Topic		Replies	Views
Behaviour of Gibbs kernel seems off Questions	6	822	May 8, 2022
Multidimensional gaussian process Questions	55	10116	October 22, 2020
Interpretation of posterior predictive checks for a Gaussian Process v5 gaussian_process , modeling , arviz	11	1180	August 24, 2023
Sampling time is very long (minutes per sample) Questions	13	4476	November 2, 2017
Runtime Exception with NUTS v3	16	1095	March 16, 2022

A very simple GP fails, giving not positive-semidefinite covariance and nan prediction

Related topics