Average Loss in optimization output

mcavallaro · September 27, 2017, 4:51pm

Hi, could someone explain what is exactly the average loss that appears in the advacement bar during the initialisation (with ADVI)? What behaviour should we expect from that number?

Average Loss = 2.1759e+07: 100%|██████████| 10000/10000 [02:11<00:00, 89.31it/s]

I’m new to variational inference, and I didn’t find any documenation about that.

junpenglao · September 27, 2017, 6:31pm

Hi @mcavallaro, in ADVI the loss is the negative of evidence lower bound (ELBO) within some window. You can not compare it across model, as it is non-normalized. What people usually do is plot it so you can check whether your model converge or not (converge to a local minimum at least). eg: http://docs.pymc.io/notebooks/variational_api_quickstart.html#Minibatches

mcavallaro · September 27, 2017, 6:42pm

Great, thanks @junpenglao ! Can I print the curren value of ELBO without running any variational fit? This would help to debug the model, I think.

EDIT: what concerned me is that I get inf as here.

Average Loss = inf: 100%|█████████████████████████| 1000/1000 [01:48<00:00,  9.18it/s]
Finished [100%]: Average Loss = 1,162.4

twiecki · September 28, 2017, 9:00am

@mcavallaro Maybe this is helpful: http://docs.pymc.io/notebooks/howto_debugging.html The model could be misspecified and seeing which values lead to infs in the logp might be enlightening.

mcavallaro · September 28, 2017, 12:09pm

@twiecki Thanks! Good to know that there is a canonical way to debug theano functions! In my case the loglikelihoods are finite, still, it’s strange that the Average Loss in my post above is reported to be infinite first and then equal to 1,162.4.

junpenglao · September 28, 2017, 12:26pm

You can first check your model to make sure the logp is not inf:

twiecki · September 28, 2017, 12:38pm

It’s not uncommon for the ELBO to be inf initially and the converge, I wouldn’t worry about it.

mcavallaro · September 28, 2017, 5:08pm

thanks @twiecki

Still I’m confused. Running

with model: f = pm.fit()

I obtain that the trace for the ELBO, f.hist contains lots of inf, (just inspecting np.where(f.hist== np.inf) , see the orange points in the pic below), but no warnings or errors…

ADVI_hist

Conversely,

with model: s = pm.sample()

eventually raises the bad initialisation / model mispecification error.

Also, doing:

for RV in model.basic_RVs:
    print(RV.name, RV.logp(model.test_point))

only finite values are printed, no inf or nan.

It is possible to plot a trajectory for the stochastic node during the ADVI step (similarly to what we do with plot(f.hist) )?

junpenglao · September 28, 2017, 6:40pm

Yes, you can track the parameters following this doc: http://docs.pymc.io/notebooks/variational_api_quickstart.html#Tracking-parameters

I think this is an indication of some hidden problem in your model (e.g., some parameter goes negative and then feed to something needs a positive input).

mcavallaro · September 29, 2017, 12:01pm

Thanks! Tracking the parameters reveals that none of them goes to infinite during ADVI, so I’m still looking for the problem.

I’m investigating this… the model seems OK, though.

schluedj · October 18, 2017, 11:56pm

Hi everyone,

I was experiencing a similar average loss inf problem in some of my models since updating to 3.2 and was able to recreate it in an extremely simple regression model (the models didn’t produce this in earlier versions of pymc3). It appears as though the model converges but then produces inf values for average loss. Code here: https://github.com/schluedj/test_examples/blob/master/Simple_Regression_infs.ipynb

Any ideas what this might be in such a simple model?

Thanks,

David

junpenglao · October 19, 2017, 7:07am

Hi David,
During the approximation, some parameters might went outside of the supporting range which cause the -inf. Looking at the loss history the fitting should be fine thought. You should track the parameters of the approximation (see instruction here), and pin down what is causing the -inf.

mcavallaro · October 19, 2017, 10:02am

@schluedj and @junpenglao That’s very similar to my issue above. Also, doing

with model_vi: M = pm.sample(draw=100)

I get:

ValueError: Bad initial energy: nan. The model might be misspecified.

Which sounds like a serious problem, and prevents the use of NUTS.

I tried to set

sigma = pm.Bound(pm.HalfCauchy, lower=0.1)('sigma', beta=10, testval=1.)

in David’s model, in order to avoid zero values of variance, but that didn’t help.

schluedj · October 19, 2017, 12:04pm

Hi @junpenglao,

Thanks! Yeah tracked the parameters for the mean-field approximation and all looked okay. However, it’s the full-rank that’s giving me problems. For the full rank, we’ve got the mean of the approximation and the lower triangular of the cholesky decomp of the covariance matrix of the variational distro (which is what I’m assuming L_tril stands for). How do I go about tracking the covariance matrix?

I’ve updated https://github.com/schluedj/test_examples/blob/master/Simple_Regression_infs.ipynb to include the variable tracking, but for the full rank, I just tracked the stds of the covariance matrix. Looks like those values hit zero around when I start getting infinite values.

schluedj · October 19, 2017, 12:09pm

Hi @mcavallaro,

Thanks for looking into this. I actually didn’t have a problem fitting mcmc on a smaller dataset. My “model_vi” model has a mini-batch setup; perhaps you’re getting the ValueError because you’re trying to run sample in a model where the data is mini-batched?

D

junpenglao · October 19, 2017, 12:22pm

You can track the cov of the full rank approximation via:

with model_vi:
    advi = pm.FullRankADVI()
    tracker = pm.callbacks.Tracker(
                                    mean=advi.approx.params[1].eval,  # callable that returns mean
                                    L_tril=advi.approx.params[0].eval  # callable that returns std
                                   )
    approx = advi.fit(50000, callbacks=[tracker])
    
fig = plt.figure(figsize=(16, 9))
mu_ax = fig.add_subplot(221)
std_ax = fig.add_subplot(222)
hist_ax = fig.add_subplot(212)
mu_ax.plot(tracker['mean'][8000:])
mu_ax.set_title('Mean track')
std_ax.plot(tracker['L_tril'][8000:])
std_ax.set_title('Std track')
hist_ax.plot(advi.hist[8000:])
hist_ax.set_title('Negative ELBO track');

to recover the cov from the lower triangular matrix:

n = approx.ddim
def L2full(L, n):
    L_tril = np.zeros((n, n))
    L_tril[np.tril_indices(n)] = L
    return L_tril.dot(L_tril.T)
L_inf = np.asarray(tracker['L_tril'])[~np.isfinite(advi.hist),:]
L_inf = [L2full(L, n) for L in L_inf]

I had a quick look, the estimation (both mu and L_tril) looks quite OK. I am not completely sure where the inf comes from…

schluedj · October 19, 2017, 12:38pm

@junpenglao

Thanks for the detailed reply.

Like I said before, for me this is something that has popped up in more complicated models that worked in previous versions of pymc3. One “sort of” fix that I found with some of these models was to change the obj_optimizer and learning rate (through a lot of trial and error). However, I did not have universal success with doing this.

junpenglao · October 19, 2017, 1:32pm

Hi David,
It would be great if you can open a new post with a bit more details on the problems you are having. My hunch is that (at least in this example) some denominators become too small, while this doesn’t affect the final estimation (as the model converges to a local minimal), it is not ideal and could indicate bias somewhere.

schluedj · October 19, 2017, 1:50pm

Hi Junpeng,

Will do. I’ll do some backtraces of the more complicated models and open a new post once I get it all organized. Thanks again

mcavallaro · October 19, 2017, 4:45pm

Yes, indeed! Thanks!

Topic		Replies	Views
Infinite loss with ADVI Questions	1	1102	January 24, 2018
Inf Average loss ADVI in Correlated Topic Model Questions	9	1513	July 25, 2018
Negative "Average loss" in ADVI Questions	4	1378	April 15, 2019
Average loss in ADVI optimization Questions	3	1331	July 23, 2018
Average loss for ADVI never decrease for my model	6	366	September 3, 2023

Average Loss in optimization output

Related topics