A very simple GP fails, giving not positive-semidefinite covariance and nan prediction

When sampling from posterior, it gives following warning:

with model:
    f_pred = gp.conditional("f_pred", Xnew=ts[:,None])
    ppc = pm.sample_ppc(trace, vars=[f_pred], samples=100)

RuntimeWarning: covariance is not positive-semidefinite.
  out = random_state.multivariate_normal(mean, cov, size)

IMO, the covariance has no way to be non PSD, since it is solely computed by built-in cov functions.
And the prediction of test data has a lot nan.

ppc['f_pred']

See the model first, followed by the data.

with pm.Model() as model:
    # Kernel
    c = pm.Normal('c', mu=0, sd=10)
    K_c = pm.gp.cov.Constant(c)
    c2 = pm.Normal('c2', mu=0, sd=10)
    nu2 = pm.HalfNormal('nu2', sd=10)
    K_lin = nu2 * pm.gp.cov.Linear(1, c2)
    K = K_c + K_lin
  
    # GP likelihood
    mu = pm.gp.mean.Zero()
    gp = pm.gp.Marginal(mean_func=mu, cov_func=K)
    sigma = pm.HalfCauchy('sigma', 2.5)
    y_obs = gp.marginal_likelihood("y_obs", X=t.reshape((-1,1)), y=y, noise=sigma)
    
    # Sample
    step = pm.Metropolis(vars=model.free_RVs)
    trace = pm.sample(500, step=step)

with model:
    f_pred = gp.conditional("f_pred", Xnew=ts[:,None])
    ppc = pm.sample_ppc(trace, vars=[f_pred], samples=100)

Data:

t = np.array([  12,   27,   42,   57,   72,   87,  102,  117,  132,  147,  162,
        177,  192,  207,  222,  237,  253,  257,  268,  283,  298,  313,
        328,  343,  358,  373,  388,  403,  418,  434,  445,  449,  464,
        479,  494,  510,  512,  525,  539,  540,  555,  570,  582,  585,
        600,  615,  624,  630,  645,  660,  676,  689,  691,  706,  721,
        722,  736,  751,  762,  766,  782,  788,  797,  812,  821,  827,
        842,  857,  872,  877,  888,  900,  902,  908,  917,  932,  948,
        953,  963,  978,  981,  993, 1001, 1009, 1018, 1024, 1039, 1054,
       1069, 1084, 1099, 1114, 1130, 1131, 1145, 1160, 1175, 1178, 1189,
       1193, 1204, 1205, 1220, 1230, 1235, 1250, 1254, 1265, 1277, 1280,
       1299, 1307, 1314, 1329, 1344, 1359, 1374, 1389, 1404, 1419, 1434,
       1449, 1464, 1479, 1494, 1509, 1524, 1539, 1554, 1569, 1584, 1599,
       1614, 1629, 1644, 1659, 1674, 1689, 1705, 1710, 1720, 1735, 1750,
       1765, 1780, 1795, 1808, 1810, 1819, 1825, 1840, 1847, 1855, 1870,
       1886, 1892, 1901, 1916, 1931, 1941, 1946, 1952, 1961, 1976, 1991,
       2006, 2021, 2026, 2037, 2041, 2052, 2067, 2082, 2097, 2113, 2119,
       2127, 2138, 2142, 2158, 2163, 2173, 2180, 2188, 2203, 2218, 2225,
       2233, 2248, 2263, 2278, 2294, 2298, 2309, 2324, 2339, 2355, 2370,
       2384, 2388, 2399, 2415, 2417, 2430, 2444, 2453, 2459, 2476, 2478,
       2491, 2505, 2506, 2521, 2534, 2536, 2551, 2566, 2581, 2597, 2599,
       2612, 2627, 2642, 2656, 2660, 2672, 2684, 2687, 2696, 2705, 2706,
       2720, 2735, 2750, 2765, 2780, 2795, 2810, 2825, 2840, 2855, 2870,
       2885, 2900, 2915, 2930, 2945, 2960, 2975, 2990, 3005, 3020, 3035,
       3050, 3065, 3080, 3095, 3110, 3125, 3140, 3155, 3170, 3172, 3185,
       3200, 3217, 3229, 3232, 3247, 3262, 3277, 3292, 3307, 3314, 3322,
       3324, 3333, 3338, 3346, 3353, 3368, 3383, 3398, 3410, 3413, 3428,
       3443, 3458, 3468, 3473, 3484, 3488, 3503, 3517, 3518, 3534, 3536,
       3549, 3564, 3573, 3579, 3594, 3609, 3625, 3633, 3640, 3655, 3670,
       3685, 3700, 3715, 3730, 3732, 3746, 3750, 3761, 3776, 3791, 3806,
       3821, 3836, 3851, 3866, 3882, 3895, 3897, 3912, 3927, 3942, 3957,
       3972, 3987, 4002, 4018, 4025, 4033, 4047, 4052, 4062, 4064, 4078,
       4079, 4093, 4100, 4108, 4123, 4138, 4141, 4156, 4157, 4172, 4187,
       4202, 4217, 4232, 4247, 4262, 4277, 4292, 4307])
y = np.array([5.2, 5.4, 5.6, 5.4, 5.5, 5.8, 5.8, 5.6, 5.6, 5.6, 5.3, 5.1, 5.2,
       5.3, 5.4, 5.5, 5.6, 6.4, 5.2, 5.2, 5.6, 5.8, 5.7, 5.7, 5.5, 5.2,
       5.2, 5.4, 5.4, 5.8, 6.7, 6.1, 6.2, 6.2, 6.2, 6.2, 6.1, 6. , 5.9,
       6.1, 6.4, 7.1, 7. , 6.9, 6.4, 5.8, 5.6, 5.4, 5.4, 5.4, 5.4, 5.5,
       5.2, 5.1, 5.2, 5.1, 5.2, 5.3, 5.3, 5.5, 5.5, 5.4, 5.4, 5.8, 6.2,
       5.5, 5. , 4.9, 5. , 5.1, 5.3, 6.3, 5.8, 5.9, 5.3, 4.9, 4.9, 4.9,
       5.3, 5.7, 6.4, 5.3, 4.9, 5.3, 5.4, 5.1, 5.1, 5.2, 5. , 4.7, 4.6,
       4.7, 4.9, 4.8, 4.9, 4.9, 5. , 5.1, 5. , 5.3, 5.2, 5.6, 6.3, 6.4,
       6.3, 5.9, 5.8, 5.8, 5.6, 5.6, 5.3, 5.2, 5.5, 5.7, 5.7, 5.4, 5.3,
       5.4, 5.6, 5.8, 5.8, 5.8, 5.6, 5.4, 5.3, 5.3, 5.2, 5.2, 5.3, 5.3,
       5.2, 4.9, 4.7, 4.7, 4.8, 5.3, 5.6, 5.7, 5.4, 5.8, 5.2, 5.6, 5.6,
       5.5, 5.7, 5.4, 4.9, 5.5, 5.7, 5.6, 5.4, 5.4, 5.5, 5.5, 5.7, 5.8,
       5.9, 6.2, 6.3, 6.3, 5.9, 6.1, 5.3, 5.3, 5.3, 5.3, 5.4, 5.4, 5.4,
       5.4, 5.4, 5.3, 5.3, 5.2, 5.2, 5.3, 5.3, 5.3, 5.3, 5.5, 5.6, 6.3,
       6.6, 6.8, 6.8, 6.2, 6.1, 5.8, 5.6, 5.4, 5.4, 5.4, 5.2, 5.3, 5.3,
       5.3, 5.2, 5.3, 6. , 5.9, 7.1, 7.3, 7.3, 6.7, 6.2, 5.9, 6.3, 5.9,
       5.6, 5.7, 6. , 5.9, 6.2, 5.8, 6.3, 5.8, 5.1, 4.8, 4.8, 5.3, 4.6,
       5.1, 5.6, 5.2, 5.1, 5.1, 4.7, 5.2, 4.9, 5.3, 5.5, 5.4, 5.7, 5.6,
       5.5, 5.5, 5.6, 5.8, 5.9, 5.9, 5.6, 5.4, 5.5, 5.6, 5.6, 5.5, 5.1,
       4.8, 4.8, 5. , 5.2, 5.3, 5.3, 5.2, 5.3, 5.6, 5.8, 5.8, 5.7, 5.7,
       5.9, 5.7, 5.6, 5.7, 5.8, 5.6, 5.8, 5.8, 5.9, 6.1, 6.3, 6.3, 6.2,
       6.3, 6.2, 6.2, 6.3, 6.7, 7.1, 6.7, 5.9, 5.4, 5.6, 5.5, 5.6, 5.4,
       5.5, 5.7, 6.2, 5.2, 5.3, 5.3, 5.3, 5.3, 5.2, 5.3, 5.2, 5.3, 5.3,
       5.2, 5.3, 5.2, 6.1, 7.1, 7.4, 7.3, 6.8, 6.3, 5.9, 5.5, 5.3, 5.2,
       5.2, 5.1, 4.9, 5.2, 5.1, 4.8, 4.8, 5.3, 6.4, 7.6, 6.8, 6.1, 6.1,
       5.9, 5.8, 5.7, 5.6, 5.4, 5.3, 5.3, 5.3, 5.1, 5.1, 4.9, 5.1, 5.1,
       5.3, 5.2, 5.9, 5.9, 6.8, 6.7, 6.6, 5.6, 5.9, 6.1, 6.3, 5.9, 5.9,
       6.3, 6.5, 6.4, 6.4, 6.6, 6.9, 6.6, 6. , 5.7])
#ts = np.linspace(0, t[-1], 100)
ts = t

Why do you sample with Metropolis? You should use the default.

There are many reason why the cov is not positive semidefinite, especially when the model is badly fitted, which resulting numerical error when you are trying to inverse the resulting cov.

This is a simplified model from my project, where I have to try Metropolis first because NUTS gets stuck easily.
Would you mind elaborating a little more on the numerical error? I have no idea how to make this work.

Even when parameters are badly fitted, they are still valid parameters for the Cov function. I suppose a valid Cov would always produce invertible covariance matrix.
Please correct me if I am wrong.

If NUTS gets stuck easily, then this is a strong indication of problem in your model. Metropolis wont solve your model problem, just make it hidden - but you will see problem down stream, like the positive definite error you are seeing now. Likely Metropolis is not providing valid sample. Did you look at the trace?

All right. I switched to NUTS, which gives more or less similar samples as Metropolis, and this still happens.
It takes about 7min for NUTS to sample, and the convergence for both Metropolis and NUTS seem pretty good, in terms of the Rhat and divergence.

Actually, I suspect that the error comes from the Kernel function instead of the samples.

Hi, if you are getting Nan values, it might be because the priors are not right. Try setting Lognormal or Halfnormal prior on length scale (c2) because it is always going to be positive. Your covariance matrix might not be positive semidefinite due to these reasons or possibly doesn’t remain so after combining them.

Thanks for your visit.
c2 is just a parameter for a linear kernel, which determines the x-coordinate of the point that all the lines in the posterior go though, so I don’t see the reason why it has to be positive.

Now I investigated the problem a bit more.
The problem is indeed caused by the non-PD covariance matrix, but numpy.random.multivariate_normal can sample from it successfully, while stats.multivariate_normal used by PyMC3 fails.

@junpenglao Would you think this be of interest to PyMC3 dev?

The way I found out is by printing ss in the pymc3/distributions/multivariate.py:MvNormal.
I am not an expert in PyMC3 so my way of debugging might a bit awkward.

Right, c2 is not lengthscale, I misread. :joy: But usually, that’s the reason for Nan. (from my past experience) Below works:

with pm.Model() as model:
# Kernel
c = pm.Lognormal('c', mu=0, sd=10)
K_c = pm.gp.cov.Constant(c)
c2 = pm.Lognormal('c2', mu=0, sd=10)
nu2 = pm.HalfNormal('nu2', sd=10)
K_lin = nu2 * pm.gp.cov.Linear(1, c2)
K = K_c + K_lin

# GP likelihood
mu = pm.gp.mean.Zero()
gp = pm.gp.Marginal(mean_func=mu, cov_func=K)
sigma = pm.HalfCauchy('sigma', 2.5)
y_obs = gp.marginal_likelihood("y_obs", X=t.reshape((-1,1)), y=y, noise=sigma)

# Sample
mp = pm.find_MAP()        
print(sorted([name+":"+str(mp[name]) for name in mp.keys() if not name.endswith("_")]))
f_pred = gp.conditional("f_pred", Xnew=ts[:,None])
ppc = pm.sample_ppc([mp], vars=[f_pred], samples=100)

fig = plt.figure(figsize=(17,10)); ax = fig.gca()
plot_gp_dist(ax, ppc['f_pred'], ts);

Used MAP because sampling was too slow.

1 Like

Hi Rohan,

Thanks for your help!
However, I tried MAP by myself already, and it worked on the original one as well.
I avoid using MAP since it is biased in the case of high dimension.

Edited:
Sorry, I skipped the last sentence. I will give it a try. Thanks!

1 Like

Unfortunately, it does not work.

I also tried to increase the magnitude of the noise (from 1e-5 to 10), which also fails.

Finally I get this done by adding a larger diagonal noise.

In source code: pymc3/gp:Marginal:_build_conditional():

#return mu, cov if pred_noise else stabilize(cov)
return mu, cov if pred_noise else cov + 1e-4 * tt.identity_like(cov)