BNN Average Loss inf or NaN on MNIST dataset

Hello,

I am currently in the process of learning and experimenting with bayesian neural networks. I am trying to compare the results of a simple NN vs BNN.

Using the example at https://twiecki.io/blog/2016/06/01/bayesian-deep-learning/ I started by replacing the moons DS with any MNIST DS (fashion in this case).

I created a subset with 2 classes out of the 10 initial ones and I also scaled the images.

The code can be found here: https://github.com/nlucian/bayesian_neural_network_vs_nn/blob/master/bayesian_neural_networks02-for_online.ipynb

Unfortunately for some reason the Average Loss is stuck at inf/nan.
Average Loss = inf: 19%|█▊ | 46620/250000

The things I tried:

  • using relu/tahn/sigmoid activation function.
  • reducing the number of images to 500 before loading the images into the model
  • tried multiple numbers for the hidden layers

Not really sure what the reason could be - even if I reduce the dimensionality to something absurd like 2 (as in the moons example) the Average Loss remains the same :frowning:

Could you please help me with an advice? I have the example using [Lasagne] but I would like to know if it is possible to do it this way - for a better understanding of the inner workings before moving to something even better

Thank you,
Lucian

Hi, i think optimization is going wrong. Learning rate or initial sigma may be large. Try to check the test point. If you have inf in logp, this will indicate model is wrong

1 Like

Hello! Thank you for your reply.

I just tried various values for obj_optimizer=pm.sgd(learning_rate=0.0005) but with no success

I also tried with different values of sigma between [0,10].
I will try adding a new hidden layer

It seems that it works if I divide the dataset with np.float32(256)
Still not sure though why - I will investigate further

That affects the posterior required to solve the task, this may be the issue. If input is not scaled KL would be huge

1 Like

Yeah I used the scalers from scikit but after I passed the values through the dimensionality reduction algorithm, it messed them again