Normalizing/Standardizing variables of interest causes bad energy in Gaussian Process model?

Gon_F · May 8, 2019, 8:24pm

So, to brush up on my knowledge of Gaussian Processes, I had started working through the example where a GP is fit to the Mauna Loa dataset, and have encountered an interesting error.

The example is very informative and concise, but if the code is followed, the model ends up returning a bad energy error.

Following the basic example, this code will lead to a bad energy warning:

ice = pd.read_csv(pm.get_data("merged_ice_core_yearly.csv"), header=26)
ice.columns = ["year", "CO2"]
ice["CO2"] = ice["CO2"].astype(np.float)

#### DATA AFTER 1958 is an average of ice core and mauna loa data, so remove it
ice = ice[ice["year"] <= 1958]
print("Number of data points:", len(ice))

t = ice.year.values
y = ice.CO2.values

# normalize the CO2 readings prior to fitting the model
y_mu, y_sd = np.mean(y[0:50]), np.std(y)
y_n = (y - y_mu) / y_sd

# scale t to have units of centuries
t_n = t / 100

with pm.Model() as model:
    η = pm.HalfNormal("η", sd=5)
    ℓ = pm.Gamma("ℓ", alpha=4, beta=2)
    α = pm.Gamma("α", alpha=3, beta=1)
    cov = η**2 * pm.gp.cov.RatQuad(1, α, ℓ)

    gp = pm.gp.Marginal(cov_func=cov)

    # x location uncertainty
    # - sd = 0.02 says the uncertainty on the point is about two years
    t_diff = pm.Normal("t_diff", mu=0.0, sd=0.02, shape=len(t))
    t_uncert = t_n - t_diff

    # white noise variance
    σ = pm.HalfNormal("σ", sd=5, testval=1)
    
    y_ = gp.marginal_likelihood("y", X = t_uncert[:, None], y = y_n, noise = σ)

with model:
    tr = pm.sample(1000, tune=1000, chains=2, cores=1, nuts_kwargs={"target_accept":0.95})

Tinkering around, I decided to do away with the x location uncertainty segment of the model (which allowed for uncertainty in the dependent variable), and ran this model instead:

with pm.Model() as model:
    η = pm.HalfNormal("η", sd=5)
    ℓ = pm.Gamma("ℓ", alpha=4, beta=2)
    α = pm.Gamma("α", alpha=3, beta=1)
    cov = η**2 * pm.gp.cov.RatQuad(1, α, ℓ)

    gp = pm.gp.Marginal(cov_func=cov)

    # white noise variance
    σ = pm.HalfNormal("σ", sd=5, testval=1)
    
    y_ = gp.marginal_likelihood("y", X = t_n[:, None], y = y_n, noise = σ)

with model:
    tr = pm.sample(1000, tune=1000, chains=2, cores=1, nuts_kwargs={"target_accept":0.95})

and I still get bad energy, so I decide to do away with the normalized/standardized t_n and y_n, and the model now starts, but fails soon enough (or takes 4 hr.+, for a GP on 111 points).

None of the different samplers, for ex. using init='advi' or pm.fit() methods, work, and they all bring back the bad energy errors.

with pm.Model() as model:
    η = pm.HalfNormal("η", sd=5)
    ℓ = pm.Gamma("ℓ", alpha=4, beta=2)
    α = pm.Gamma("α", alpha=3, beta=1)
    cov = η**2 * pm.gp.cov.RatQuad(1, α, ℓ)

    gp = pm.gp.Marginal(cov_func=cov)

    # white noise variance
    σ = pm.HalfNormal("σ", sd=5, testval=1)
    
    y_ = gp.marginal_likelihood("y", X = t[:, None], y = y, noise = σ)

with model:
    tr = pm.sample(1000, tune=1000, chains=2, cores=1, nuts_kwargs={"target_accept":0.95})

How can this make sense theoretically, that pre-processing the variables in a regular, straightforward way (as recommended virtually everywhere) would create trouble for MCMC? On top of that, how could this model be made to work at all?

I would be thankful if there exists an interesting explanation for this that any experienced modellers would care to share, and also advice on getting this model to run.

Thank you very much for your time.

junpenglao · May 9, 2019, 1:30am

Very quick comment as I have not look through your model, but normalizing variable makes your model unidentifiable, which likely cause the bad energy error.

Gon_F · May 9, 2019, 2:41am

Thank you for your quick glance either way.

In that case, why would normalizing a variable make my model unidentifiable? The correlation between t and y up above is only 0.63, so I would not imagine identifiability issues would pop up.

chartl · May 9, 2019, 4:49pm

I have run the example code, (python 3.7.1 [Anaconda3]; theano 1.0.4; pymc3 3.6) and I do not get a bad energy warning.

Gon_F · May 10, 2019, 5:07pm

Ok, investigating that this might be an unexpected version bug, this is my current setup:

OS: Ubuntu 16.04.6 LTS
Python ver.: Python 3.6.6, Intel distribution
Numpy ver.: 1.16.3
Theano ver.: 1.0.4
Pymc3 ver.: 3.6

After forcibly upgrading python to 3.7 with conda install python==3.7 (so no more Intel distribution), and making sure I have the latest pymc by pip installing it from master, and downgrading numpy to ver. 1.15.0, the error still appears.

I will post the whole bad energy error I see (first the original error, now the new one),

Python 3.6:

Auto-assigning NUTS sampler...
INFO:pymc3:Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
INFO:pymc3:Initializing NUTS using jitter+adapt_diag...
/home/furthermltesting/anaconda3/envs/idp/lib/python3.6/site-packages/theano/tensor/basic.py:6611: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  result[diagonal_slice] = x
/home/furthermltesting/anaconda3/envs/idp/lib/python3.6/site-packages/theano/tensor/basic.py:6611: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  result[diagonal_slice] = x
/home/furthermltesting/anaconda3/envs/idp/lib/python3.6/site-packages/theano/tensor/basic.py:6611: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  result[diagonal_slice] = x
/home/furthermltesting/anaconda3/envs/idp/lib/python3.6/site-packages/theano/tensor/basic.py:6611: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  result[diagonal_slice] = x
Sequential sampling (2 chains in 1 job)
INFO:pymc3:Sequential sampling (2 chains in 1 job)
NUTS: [σ, α, ℓ, η]
INFO:pymc3:NUTS: [σ, α, ℓ, η]
  0%|          | 0/2000 [00:00<?, ?it/s]/home/furthermltesting/anaconda3/envs/idp/lib/python3.6/site-packages/theano/tensor/basic.py:6611: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  result[diagonal_slice] = x
/home/furthermltesting/anaconda3/envs/idp/lib/python3.6/site-packages/theano/tensor/basic.py:6611: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  result[diagonal_slice] = x

---------------------------------------------------------------------------
SamplingError                             Traceback (most recent call last)
<ipython-input-9-4581e183b309> in <module>()
      1 with model:
----> 2     tr = pm.sample(1000, tune=1000, chains=2, cores=1, nuts_kwargs={"target_accept":0.95})
      3 #     tr = pm.sample(n=10000, init='advi')

~/anaconda3/envs/idp/lib/python3.6/site-packages/pymc3/sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, nuts_kwargs, step_kwargs, progressbar, model, random_seed, live_plot, discard_tuned_samples, live_plot_kwargs, compute_convergence_checks, use_mmap, **kwargs)
    457                 _log.info('Sequential sampling ({} chains in 1 job)'.format(chains))
    458                 _print_step_hierarchy(step)
--> 459                 trace = _sample_many(**sample_args)
    460 
    461         discard = tune if discard_tuned_samples else 0

~/anaconda3/envs/idp/lib/python3.6/site-packages/pymc3/sampling.py in _sample_many(draws, chain, chains, start, random_seed, step, **kwargs)
    503     for i in range(chains):
    504         trace = _sample(draws=draws, chain=chain + i, start=start[i],
--> 505                         step=step, random_seed=random_seed[i], **kwargs)
    506         if trace is None:
    507             if len(traces) == 0:

~/anaconda3/envs/idp/lib/python3.6/site-packages/pymc3/sampling.py in _sample(chain, progressbar, random_seed, start, draws, step, trace, tune, model, live_plot, live_plot_kwargs, **kwargs)
    547     try:
    548         strace = None
--> 549         for it, strace in enumerate(sampling):
    550             if live_plot:
    551                 if live_plot_kwargs is None:

~/anaconda3/envs/idp/lib/python3.6/site-packages/tqdm/_tqdm.py in __iter__(self)
   1020                 """), fp_write=getattr(self.fp, 'write', sys.stderr.write))
   1021 
-> 1022             for obj in iterable:
   1023                 yield obj
   1024                 # Update and possibly print the progressbar.

~/anaconda3/envs/idp/lib/python3.6/site-packages/pymc3/sampling.py in _iter_sample(draws, step, start, trace, chain, tune, model, random_seed)
    643                 step = stop_tuning(step)
    644             if step.generates_stats:
--> 645                 point, states = step.step(point)
    646                 if strace.supports_sampler_stats:
    647                     strace.record(point, states)

~/anaconda3/envs/idp/lib/python3.6/site-packages/pymc3/step_methods/arraystep.py in step(self, point)
    245 
    246         if self.generates_stats:
--> 247             apoint, stats = self.astep(array)
    248             point = self._logp_dlogp_func.array_to_full_dict(apoint)
    249             return point, stats

~/anaconda3/envs/idp/lib/python3.6/site-packages/pymc3/step_methods/hmc/base_hmc.py in astep(self, q0)
    147             )
    148             self._warnings.append(warning)
--> 149             raise SamplingError("Bad initial energy")
    150 
    151         adapt_step = self.tune and self.adapt_step_size

SamplingError: Bad initial energy

Python 3.7:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
/home/furthermltesting/anaconda3/envs/idp/lib/python3.7/site-packages/theano/tensor/basic.py:6611: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  result[diagonal_slice] = x
/home/furthermltesting/anaconda3/envs/idp/lib/python3.7/site-packages/theano/tensor/basic.py:6611: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  result[diagonal_slice] = x
Sequential sampling (2 chains in 1 job)
NUTS: [σ, t_diff, α, ℓ, η]
  0%|          | 0/2000 [00:00<?, ?it/s]/home/furthermltesting/anaconda3/envs/idp/lib/python3.7/site-packages/theano/tensor/basic.py:6611: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  result[diagonal_slice] = x

---------------------------------------------------------------------------
SamplingError                             Traceback (most recent call last)
<ipython-input-10-4581e183b309> in <module>
      1 with model:
----> 2     tr = pm.sample(1000, tune=1000, chains=2, cores=1, nuts_kwargs={"target_accept":0.95})
      3 #     tr = pm.sample(n=10000, init='advi')

~/anaconda3/envs/idp/lib/python3.7/site-packages/pymc3/sampling.py in sample(draws, step, init, n_init, start, trace, chain_idx, chains, cores, tune, progressbar, model, random_seed, discard_tuned_samples, compute_convergence_checks, **kwargs)
    450                 _log.info('Sequential sampling ({} chains in 1 job)'.format(chains))
    451                 _print_step_hierarchy(step)
--> 452                 trace = _sample_many(**sample_args)
    453 
    454         discard = tune if discard_tuned_samples else 0

~/anaconda3/envs/idp/lib/python3.7/site-packages/pymc3/sampling.py in _sample_many(draws, chain, chains, start, random_seed, step, **kwargs)
    496     for i in range(chains):
    497         trace = _sample(draws=draws, chain=chain + i, start=start[i],
--> 498                         step=step, random_seed=random_seed[i], **kwargs)
    499         if trace is None:
    500             if len(traces) == 0:

~/anaconda3/envs/idp/lib/python3.7/site-packages/pymc3/sampling.py in _sample(chain, progressbar, random_seed, start, draws, step, trace, tune, model, **kwargs)
    538     try:
    539         strace = None
--> 540         for it, strace in enumerate(sampling):
    541             if it >= skip_first:
    542                 trace = MultiTrace([strace])

~/anaconda3/envs/idp/lib/python3.7/site-packages/tqdm/_tqdm.py in __iter__(self)
   1020                 """), fp_write=getattr(self.fp, 'write', sys.stderr.write))
   1021 
-> 1022             for obj in iterable:
   1023                 yield obj
   1024                 # Update and possibly print the progressbar.

~/anaconda3/envs/idp/lib/python3.7/site-packages/pymc3/sampling.py in _iter_sample(draws, step, start, trace, chain, tune, model, random_seed)
    627                 step = stop_tuning(step)
    628             if step.generates_stats:
--> 629                 point, states = step.step(point)
    630                 if strace.supports_sampler_stats:
    631                     strace.record(point, states)

~/anaconda3/envs/idp/lib/python3.7/site-packages/pymc3/step_methods/arraystep.py in step(self, point)
    245 
    246         if self.generates_stats:
--> 247             apoint, stats = self.astep(array)
    248             point = self._logp_dlogp_func.array_to_full_dict(apoint)
    249             return point, stats

~/anaconda3/envs/idp/lib/python3.7/site-packages/pymc3/step_methods/hmc/base_hmc.py in astep(self, q0)
    142             )
    143             self._warnings.append(warning)
--> 144             raise SamplingError("Bad initial energy")
    145 
    146         adapt_step = self.tune and self.adapt_step_size

SamplingError: Bad initial energy

What could possibly be going on here?

chartl · May 10, 2019, 7:23pm

Try using random testvals? Like testval=np.random.gamma(...) for the gammas.

Numpy and theano also link against whatever linear algebra libraries you have; and can be unhappy if you have multiple versions installed. My head hurts just thinking about this. Good luck and godspeed…

Gon_F · May 14, 2019, 10:40pm

So, following what chartl said about multiple linear algebra libraries, my thoughts immediately went to intel-mkl conflicting with openblas somewhere…

Using conda list, I noticed I indeed had multiple libraries with either intel, mkl, or openblas in the build or channel name. I decided to uninstall anything containing the former two, and reinstall everything with conda install -c conda-forge ..., so now I have the following, but Pymc3 still gives the same bad energy error:

# Name                    Version                   Build  Channel
absl-py                   0.7.1                    py37_0    conda-forge
astor                     0.7.1                      py_0    conda-forge
attrs                     19.1.0                     py_0    conda-forge
backcall                  0.1.0                      py_0    conda-forge
binutils_impl_linux-64    2.31.1               h6176602_1  
binutils_linux-64         2.31.1               h6176602_6  
blas                      2.8                    openblas    conda-forge
bleach                    3.1.0                      py_0    conda-forge
bzip2                     1.0.6             h14c3975_1002    conda-forge
c-ares                    1.15.0            h14c3975_1001    conda-forge
ca-certificates           2019.3.9             hecc5488_0    conda-forge
certifi                   2019.3.9                 py37_0    conda-forge
cudatoolkit               8.0                           3  
cudnn                     7.1.3                 cuda8.0_0  
cycler                    0.10.0                     py_1    conda-forge
dbus                      1.13.6               he372182_0    conda-forge
decorator                 4.4.0                      py_0    conda-forge
defusedxml                0.5.0                      py_1    conda-forge

Thinking that some unseen remnant of my intel python distribution was left around, I did a blanket reinstall of anaconda through wget, reinstalled everything into a Python 3.7.3 environment with conda install ... (only being forced to use conda install -c conda-forge theano to get theano 3.6), and it works!

It seems that openblas, either by itself, or when used at the same time as intel-mkl, produces errors on at least Ubuntu 16.04.6.

Thanks for the suggestion, chartl!

Topic		Replies	Views
Bad initial energy when trying to build a multivariate ExGaussian model Questions	5	529	August 14, 2020
Either "too many indices for array" or "array must not contain infs or NaNs" error with a GP Questions	4	2920	May 18, 2019
ValueError: Bad initial energy: inf. The model might be misspecified Questions	1	1054	April 26, 2018
SamplingError("Bad initial energy") Questions	2	495	May 26, 2020
Bad Initial Energy Questions	9	1501	November 17, 2021

Normalizing/Standardizing variables of interest causes bad energy in Gaussian Process model?

Related topics