Average Loss in optimization output


Hi, could someone explain what is exactly the average loss that appears in the advacement bar during the initialisation (with ADVI)? What behaviour should we expect from that number?

Average Loss = 2.1759e+07: 100%|██████████| 10000/10000 [02:11<00:00, 89.31it/s]

I’m new to variational inference, and I didn’t find any documenation about that.


Hi @mcavallaro, in ADVI the loss is the negative of evidence lower bound (ELBO) within some window. You can not compare it across model, as it is non-normalized. What people usually do is plot it so you can check whether your model converge or not (converge to a local minimum at least). eg: http://docs.pymc.io/notebooks/variational_api_quickstart.html#Minibatches


Great, thanks @junpenglao ! Can I print the curren value of ELBO without running any variational fit? This would help to debug the model, I think.

EDIT: what concerned me is that I get inf as here.

Average Loss = inf: 100%|█████████████████████████| 1000/1000 [01:48<00:00,  9.18it/s]
Finished [100%]: Average Loss = 1,162.4


@mcavallaro Maybe this is helpful: http://docs.pymc.io/notebooks/howto_debugging.html The model could be misspecified and seeing which values lead to infs in the logp might be enlightening.


@twiecki Thanks! Good to know that there is a canonical way to debug theano functions! In my case the loglikelihoods are finite, still, it’s strange that the Average Loss in my post above is reported to be infinite first and then equal to 1,162.4.


You can first check your model to make sure the logp is not inf:


It’s not uncommon for the ELBO to be inf initially and the converge, I wouldn’t worry about it.


thanks @twiecki

Still I’m confused. Running

with model: f = pm.fit()

I obtain that the trace for the ELBO, f.hist contains lots of inf, (just inspecting np.where(f.hist== np.inf) , see the orange points in the pic below), but no warnings or errors…



with model: s = pm.sample()

eventually raises the bad initialisation / model mispecification error.

Also, doing:

for RV in model.basic_RVs:
    print(RV.name, RV.logp(model.test_point))

only finite values are printed, no inf or nan.

It is possible to plot a trajectory for the stochastic node during the ADVI step (similarly to what we do with plot(f.hist) )?


Yes, you can track the parameters following this doc: http://docs.pymc.io/notebooks/variational_api_quickstart.html#Tracking-parameters

I think this is an indication of some hidden problem in your model (e.g., some parameter goes negative and then feed to something needs a positive input).


Thanks! Tracking the parameters reveals that none of them goes to infinite during ADVI, so I’m still looking for the problem.

I’m investigating this… the model seems OK, though.


Hi everyone,

I was experiencing a similar average loss inf problem in some of my models since updating to 3.2 and was able to recreate it in an extremely simple regression model (the models didn’t produce this in earlier versions of pymc3). It appears as though the model converges but then produces inf values for average loss. Code here: https://github.com/schluedj/test_examples/blob/master/Simple_Regression_infs.ipynb

Any ideas what this might be in such a simple model?




Hi David,
During the approximation, some parameters might went outside of the supporting range which cause the -inf. Looking at the loss history the fitting should be fine thought. You should track the parameters of the approximation (see instruction here), and pin down what is causing the -inf.


@schluedj and @junpenglao That’s very similar to my issue above. Also, doing

with model_vi: M = pm.sample(draw=100)

I get:

ValueError: Bad initial energy: nan. The model might be misspecified.  

Which sounds like a serious problem, and prevents the use of NUTS.

I tried to set

sigma = pm.Bound(pm.HalfCauchy, lower=0.1)('sigma', beta=10, testval=1.)

in David’s model, in order to avoid zero values of variance, but that didn’t help.


Hi @junpenglao,

Thanks! Yeah tracked the parameters for the mean-field approximation and all looked okay. However, it’s the full-rank that’s giving me problems. For the full rank, we’ve got the mean of the approximation and the lower triangular of the cholesky decomp of the covariance matrix of the variational distro (which is what I’m assuming L_tril stands for). How do I go about tracking the covariance matrix?

I’ve updated https://github.com/schluedj/test_examples/blob/master/Simple_Regression_infs.ipynb to include the variable tracking, but for the full rank, I just tracked the stds of the covariance matrix. Looks like those values hit zero around when I start getting infinite values.


Hi @mcavallaro,

Thanks for looking into this. I actually didn’t have a problem fitting mcmc on a smaller dataset. My “model_vi” model has a mini-batch setup; perhaps you’re getting the ValueError because you’re trying to run sample in a model where the data is mini-batched?

  • D


You can track the cov of the full rank approximation via:

with model_vi:
    advi = pm.FullRankADVI()
    tracker = pm.callbacks.Tracker(
                                    mean=advi.approx.params[1].eval,  # callable that returns mean
                                    L_tril=advi.approx.params[0].eval  # callable that returns std
    approx = advi.fit(50000, callbacks=[tracker])
fig = plt.figure(figsize=(16, 9))
mu_ax = fig.add_subplot(221)
std_ax = fig.add_subplot(222)
hist_ax = fig.add_subplot(212)
mu_ax.set_title('Mean track')
std_ax.set_title('Std track')
hist_ax.set_title('Negative ELBO track');

to recover the cov from the lower triangular matrix:

n = approx.ddim
def L2full(L, n):
    L_tril = np.zeros((n, n))
    L_tril[np.tril_indices(n)] = L
    return L_tril.dot(L_tril.T)
L_inf = np.asarray(tracker['L_tril'])[~np.isfinite(advi.hist),:]
L_inf = [L2full(L, n) for L in L_inf]

I had a quick look, the estimation (both mu and L_tril) looks quite OK. I am not completely sure where the inf comes from…



Thanks for the detailed reply.

Like I said before, for me this is something that has popped up in more complicated models that worked in previous versions of pymc3. One “sort of” fix that I found with some of these models was to change the obj_optimizer and learning rate (through a lot of trial and error). However, I did not have universal success with doing this.


Hi David,
It would be great if you can open a new post with a bit more details on the problems you are having. My hunch is that (at least in this example) some denominators become too small, while this doesn’t affect the final estimation (as the model converges to a local minimal), it is not ideal and could indicate bias somewhere.


Hi Junpeng,

Will do. I’ll do some backtraces of the more complicated models and open a new post once I get it all organized. Thanks again


Yes, indeed! Thanks!