BNN Average Loss inf or NaN on MNIST dataset

lucian · January 7, 2020, 1:36am

Hello,

I am currently in the process of learning and experimenting with bayesian neural networks. I am trying to compare the results of a simple NN vs BNN.

Using the example at https://twiecki.io/blog/2016/06/01/bayesian-deep-learning/ I started by replacing the moons DS with any MNIST DS (fashion in this case).

I created a subset with 2 classes out of the 10 initial ones and I also scaled the images.

The code can be found here: https://github.com/nlucian/bayesian_neural_network_vs_nn/blob/master/bayesian_neural_networks02-for_online.ipynb

Unfortunately for some reason the Average Loss is stuck at inf/nan.
Average Loss = inf: 19%|█▊ | 46620/250000

The things I tried:

using relu/tahn/sigmoid activation function.
reducing the number of images to 500 before loading the images into the model
tried multiple numbers for the hidden layers

Not really sure what the reason could be - even if I reduce the dimensionality to something absurd like 2 (as in the moons example) the Average Loss remains the same

Could you please help me with an advice? I have the example using [Lasagne] but I would like to know if it is possible to do it this way - for a better understanding of the inner workings before moving to something even better

Thank you,
Lucian

ferrine · January 7, 2020, 2:36pm

Hi, i think optimization is going wrong. Learning rate or initial sigma may be large. Try to check the test point. If you have inf in logp, this will indicate model is wrong

lucian · January 7, 2020, 7:20pm

Hello! Thank you for your reply.

I just tried various values for obj_optimizer=pm.sgd(learning_rate=0.0005) but with no success

lucian · January 7, 2020, 8:11pm

I also tried with different values of sigma between [0,10].
I will try adding a new hidden layer

lucian · January 7, 2020, 11:32pm

It seems that it works if I divide the dataset with np.float32(256)
Still not sure though why - I will investigate further

ferrine · January 8, 2020, 3:28pm

That affects the posterior required to solve the task, this may be the issue. If input is not scaled KL would be huge

lucian · January 8, 2020, 3:44pm

Yeah I used the scalers from scikit but after I passed the values through the dimensionality reduction algorithm, it messed them again

Topic		Replies	Views
Bayesian Neural Network captures the mean response but not the variance in the training data Questions	11	1624	August 10, 2021
Average Loss = inf Questions	7	2469	March 21, 2018
Having Trouble in Using Mini Batch ADVI for HAR dataset Questions	5	702	December 15, 2018
Poor Accuracy of BNN for MNIST Questions	5	903	September 27, 2018
Bayesian Neural Network unable to converge on simple model Questions	19	1817	March 4, 2022

BNN Average Loss inf or NaN on MNIST dataset

Related topics