Hierarchical Bayesian Neural Networks with Informative Priors by @twiecki

New blog post by @twiecki

4 Likes

Very cool stuff, as per usual :slight_smile:

One thing I can’t help but notice is that the number of weights (n_hidden) is very small here - just 5 neurons per layer. When I played around with twiecki’s previous bayesian NN examples and did my own experiments, I ran into really severe issues with non-identifiability and multimodality in the posterior as you increase the number of neurons per layer.

ADVI fails badly as the number of weights increase for obvious reasons (both mode-seeking and mode-covering behaviour result in poor approximations), and NUTS takes forever and also struggles with the model structure. To me this is the biggest challenge facing the application of bayesian NN’s, and I dunno if it’s been satisfactorily solved yet.

Unless I’m mistaken, but due to Numpy’s constraints with stacking, this requires there be an equal number of samples in each group, right?

In reality, this is rarely the case. I’ve been thinking of getting around this problem using masked arrays, but before I try that I was wondering if anyone had any intuition of how PyMC3/Theano will handle masked arrays as the input (I’ve only ever seen examples of masked arrays being used as an observed variable).

Alternatively, I’d appreciate any other suggestions to get around this problem. Thanks!