NaN occurred in optimization with NUTS

junpenglao · June 13, 2017, 2:11pm

This is an issue I sometimes has as well, a few example on Github:
FloatingPointError: NaN occurred in optimization with NUTS #2272
Issues getting variable in hierarchical model to update #2271

My first instinct is usually: model misspecification.

For example, commenting on issue #2271:
I would check the logp of each random variable and the model

mu.logp(model.test_point)
model.logp(model.test_point)

Make sure that no logp is inf.

Another quick try, is to reduce the learning rate (1e-3 is default):

obj_optimizer=pm.adagrad_window(learning_rate=2e-4)

as in:

approx = pm.fit(n=10000, method='advi', model=model,
            obj_optimizer=pm.adagrad_window(learning_rate=2e-4)
            )  # type: pm.MeanField

Other thoughts? @ferrine

ferrine · June 13, 2017, 8:17pm

New option after #2257

approx = pm.fit(n=10000, method='advi', model=model,
            obj_optimizer=pm.adagrad_window(learning_rate=2e-4),
            total_grad_norm_constraint=10 # other constants can appear here, I do not coin 10
            )

aseyboldt · June 13, 2017, 9:53pm

Do we have an option for passing that into sample for advi initialization?
And is there a possibility for the advi optimizer to ignore a couple of nans/infs near the start of the optimization?

ferrine · June 13, 2017, 10:03pm

Yes, start point is supported with start kwarg. Nan can’t be ignored as it goes to updates and breaks all. Infs can be ignored as they don’t turn updates to nan

aseyboldt · June 13, 2017, 10:14pm

Sorry, my question wasn’t very precise. What I meant was a way to pass parameters like the learning rate to pm.sample. Something analog to nuts_kwargs/step_kwargs.
And about the nans: Wouldn’t it be possible to always store the previous state before a step and then if we encounter a nan go back one step and decrease the learning_rate or so? I don’t really know the literature about those optimizers well, so I hope I’m not asking something stupid here.

ferrine · June 13, 2017, 10:20pm

Hmm. Mass matrix start is not supported now for advi. Now advi in pm.sample uses defaults.

Dealing with nans can be tricky. I see the only way in creating a custom callback that dumps parameters after every few steps

aseyboldt · June 13, 2017, 10:30pm

Is it difficult from an algorithmic point of view or because of the implementation?
I think the “NaN occurred in optimization” errors are a major pain point for using NUTS right now. I’ve seen a couple of models that sampled well when using previous chains to initialize the scaling, but don’t work with the advi initialization because of some stray nan somewhere in the beginning. As a workaround I’ve used something like this:

with model:
    stds = np.ones(model.ndim)
    for _ in range(5):
        args = {'scaling': stds ** 2, 'is_cov': True}
        trace = pm.sample(100, tune=100, init=None, nuts_kwargs=args)
        samples = [model.dict_to_array(p) for p in trace]
        stds = np.array(samples).std(axis=0)
    
    traces = []
    for i in range(2):
        step = pm.NUTS(scaling=stds ** 2, is_cov=True, target_accept=0.9)
        start = trace[-10 * i]
        trace_ = pm.sample(1000, tune=800, chain=i, init=None, step=step, start=start)
        traces.append(trace_)
    trace = pm.backends.base.merge_traces(traces)

I guess we should put that somewhere in NUTS itself, this is similar to what stan does for initialzation.

ferrine · June 13, 2017, 10:35pm

That’s not difficult. Just haven’t thought about it.

junpenglao · May 29, 2018, 8:43am

A post was split to a new topic: NaN occured in optimization in a VonMises mixture model

Topic		Replies	Views
ADVI changes start values to NaN Questions	2	874	January 18, 2021
NaN occurred in optimization at first Iteration with ADVI Questions	3	751	February 18, 2020
Transitioning from pm.sample() to pm.fit() v5	15	147	May 28, 2025
NaN occurred in optimization with ADVI Questions	5	4578	August 6, 2019
How to initialize NUTS sampler with advi using non-standard obj_optimizer? v5 variational_inferenc , sampling	3	631	April 14, 2023

NaN occurred in optimization with NUTS

Related topics