# Average Loss in optimization output

Hi, could someone explain what is exactly the `average loss` that appears in the advacement bar during the initialisation (with ADVI)? What behaviour should we expect from that number?

``````Average Loss = 2.1759e+07: 100%|██████████| 10000/10000 [02:11<00:00, 89.31it/s]
``````

I’m new to variational inference, and I didn’t find any documenation about that.

Hi @mcavallaro, in ADVI the loss is the negative of evidence lower bound (ELBO) within some window. You can not compare it across model, as it is non-normalized. What people usually do is plot it so you can check whether your model converge or not (converge to a local minimum at least). eg: http://docs.pymc.io/notebooks/variational_api_quickstart.html#Minibatches

Great, thanks @junpenglao ! Can I print the curren value of ELBO without running any variational fit? This would help to debug the model, I think.

EDIT: what concerned me is that I get `inf` as here.

``````Average Loss = inf: 100%|█████████████████████████| 1000/1000 [01:48<00:00,  9.18it/s]
Finished [100%]: Average Loss = 1,162.4``````

@mcavallaro Maybe this is helpful: http://docs.pymc.io/notebooks/howto_debugging.html The model could be misspecified and seeing which values lead to infs in the logp might be enlightening.

@twiecki Thanks! Good to know that there is a canonical way to debug theano functions! In my case the loglikelihoods are finite, still, it’s strange that the `Average Loss` in my post above is reported to be infinite first and then equal to `1,162.4`.

You can first check your model to make sure the logp is not inf:

It’s not uncommon for the ELBO to be inf initially and the converge, I wouldn’t worry about it.

thanks @twiecki

Still I’m confused. Running

``````with model: f = pm.fit()
``````

I obtain that the trace for the ELBO, `f.hist` contains lots of `inf`, (just inspecting `np.where(f.hist== np.inf)` , see the orange points in the pic below), but no warnings or errors…

Conversely,

``````with model: s = pm.sample()
``````

eventually raises the bad initialisation / model mispecification error.

Also, doing:

``````for RV in model.basic_RVs:
print(RV.name, RV.logp(model.test_point))
``````

only finite values are printed, no `inf` or `nan`.

It is possible to plot a trajectory for the stochastic node during the ADVI step (similarly to what we do with `plot(f.hist)` )?

Yes, you can track the parameters following this doc: http://docs.pymc.io/notebooks/variational_api_quickstart.html#Tracking-parameters

I think this is an indication of some hidden problem in your model (e.g., some parameter goes negative and then feed to something needs a positive input).

Thanks! Tracking the parameters reveals that none of them goes to infinite during ADVI, so I’m still looking for the problem.

I’m investigating this… the model seems OK, though.

Hi everyone,

I was experiencing a similar average loss inf problem in some of my models since updating to 3.2 and was able to recreate it in an extremely simple regression model (the models didn’t produce this in earlier versions of pymc3). It appears as though the model converges but then produces inf values for average loss. Code here: https://github.com/schluedj/test_examples/blob/master/Simple_Regression_infs.ipynb

Any ideas what this might be in such a simple model?

Thanks,

David

Hi David,
During the approximation, some parameters might went outside of the supporting range which cause the -inf. Looking at the loss history the fitting should be fine thought. You should track the parameters of the approximation (see instruction here), and pin down what is causing the -inf.

@schluedj and @junpenglao That’s very similar to my issue above. Also, doing

``````with model_vi: M = pm.sample(draw=100)
``````

I get:

``````ValueError: Bad initial energy: nan. The model might be misspecified.
``````

Which sounds like a serious problem, and prevents the use of NUTS.

I tried to set

``````sigma = pm.Bound(pm.HalfCauchy, lower=0.1)('sigma', beta=10, testval=1.)
``````

in David’s model, in order to avoid zero values of variance, but that didn’t help.

Hi @junpenglao,

Thanks! Yeah tracked the parameters for the mean-field approximation and all looked okay. However, it’s the full-rank that’s giving me problems. For the full rank, we’ve got the mean of the approximation and the lower triangular of the cholesky decomp of the covariance matrix of the variational distro (which is what I’m assuming L_tril stands for). How do I go about tracking the covariance matrix?

I’ve updated https://github.com/schluedj/test_examples/blob/master/Simple_Regression_infs.ipynb to include the variable tracking, but for the full rank, I just tracked the stds of the covariance matrix. Looks like those values hit zero around when I start getting infinite values.

Hi @mcavallaro,

Thanks for looking into this. I actually didn’t have a problem fitting mcmc on a smaller dataset. My “model_vi” model has a mini-batch setup; perhaps you’re getting the ValueError because you’re trying to run sample in a model where the data is mini-batched?

• D

You can track the cov of the full rank approximation via:

``````with model_vi:
tracker = pm.callbacks.Tracker(
mean=advi.approx.params[1].eval,  # callable that returns mean
L_tril=advi.approx.params[0].eval  # callable that returns std
)

fig = plt.figure(figsize=(16, 9))
mu_ax.plot(tracker['mean'][8000:])
mu_ax.set_title('Mean track')
std_ax.plot(tracker['L_tril'][8000:])
std_ax.set_title('Std track')
hist_ax.set_title('Negative ELBO track');
``````

to recover the cov from the lower triangular matrix:

``````n = approx.ddim
def L2full(L, n):
L_tril = np.zeros((n, n))
L_tril[np.tril_indices(n)] = L
return L_tril.dot(L_tril.T)
L_inf = [L2full(L, n) for L in L_inf]
``````

I had a quick look, the estimation (both `mu` and `L_tril`) looks quite OK. I am not completely sure where the inf comes from…

1 Like

@junpenglao