So, to brush up on my knowledge of Gaussian Processes, I had started working through the example where a GP is fit to the Mauna Loa dataset, and have encountered an interesting error.

The example is very informative and concise, but if the code is followed, the model ends up returning a `bad energy`

error.

Following the basic example, this code will lead to a `bad energy`

warning:

```
ice = pd.read_csv(pm.get_data("merged_ice_core_yearly.csv"), header=26)
ice.columns = ["year", "CO2"]
ice["CO2"] = ice["CO2"].astype(np.float)
#### DATA AFTER 1958 is an average of ice core and mauna loa data, so remove it
ice = ice[ice["year"] <= 1958]
print("Number of data points:", len(ice))
t = ice.year.values
y = ice.CO2.values
# normalize the CO2 readings prior to fitting the model
y_mu, y_sd = np.mean(y[0:50]), np.std(y)
y_n = (y - y_mu) / y_sd
# scale t to have units of centuries
t_n = t / 100
with pm.Model() as model:
η = pm.HalfNormal("η", sd=5)
ℓ = pm.Gamma("ℓ", alpha=4, beta=2)
α = pm.Gamma("α", alpha=3, beta=1)
cov = η**2 * pm.gp.cov.RatQuad(1, α, ℓ)
gp = pm.gp.Marginal(cov_func=cov)
# x location uncertainty
# - sd = 0.02 says the uncertainty on the point is about two years
t_diff = pm.Normal("t_diff", mu=0.0, sd=0.02, shape=len(t))
t_uncert = t_n - t_diff
# white noise variance
σ = pm.HalfNormal("σ", sd=5, testval=1)
y_ = gp.marginal_likelihood("y", X = t_uncert[:, None], y = y_n, noise = σ)
with model:
tr = pm.sample(1000, tune=1000, chains=2, cores=1, nuts_kwargs={"target_accept":0.95})
```

Tinkering around, I decided to do away with the `x location uncertainty`

segment of the model (which allowed for uncertainty in the dependent variable), and ran this model instead:

```
with pm.Model() as model:
η = pm.HalfNormal("η", sd=5)
ℓ = pm.Gamma("ℓ", alpha=4, beta=2)
α = pm.Gamma("α", alpha=3, beta=1)
cov = η**2 * pm.gp.cov.RatQuad(1, α, ℓ)
gp = pm.gp.Marginal(cov_func=cov)
# white noise variance
σ = pm.HalfNormal("σ", sd=5, testval=1)
y_ = gp.marginal_likelihood("y", X = t_n[:, None], y = y_n, noise = σ)
with model:
tr = pm.sample(1000, tune=1000, chains=2, cores=1, nuts_kwargs={"target_accept":0.95})
```

and I still get `bad energy`

, so I decide to do away with the normalized/standardized `t_n`

and `y_n`

, and the model now **starts**, but fails soon enough (or takes 4 hr.+, for a GP on 111 points).

None of the different samplers, for ex. using `init='advi'`

or `pm.fit()`

methods, work, and they all bring back the `bad energy`

errors.

```
with pm.Model() as model:
η = pm.HalfNormal("η", sd=5)
ℓ = pm.Gamma("ℓ", alpha=4, beta=2)
α = pm.Gamma("α", alpha=3, beta=1)
cov = η**2 * pm.gp.cov.RatQuad(1, α, ℓ)
gp = pm.gp.Marginal(cov_func=cov)
# white noise variance
σ = pm.HalfNormal("σ", sd=5, testval=1)
y_ = gp.marginal_likelihood("y", X = t[:, None], y = y, noise = σ)
with model:
tr = pm.sample(1000, tune=1000, chains=2, cores=1, nuts_kwargs={"target_accept":0.95})
```

How can this make sense theoretically, that pre-processing the variables in a regular, straightforward way (as recommended virtually everywhere) would *create* trouble for MCMC? On top of that, how could this model be made to work at all?

I would be thankful if there exists an interesting explanation for this that any experienced modellers would care to share, and also advice on getting this model to run.

Thank you very much for your time.