# Hierarchical Bayesian Neural Networks with Informative Priors by @twiecki

One thing I canâ€™t help but notice is that the number of weights (n_hidden) is very small here - just 5 neurons per layer. When I played around with twieckiâ€™s previous bayesian NN examples and did my own experiments, I ran into really severe issues with non-identifiability and multimodality in the posterior as you increase the number of neurons per layer.

ADVI fails badly as the number of weights increase for obvious reasons (both mode-seeking and mode-covering behaviour result in poor approximations), and NUTS takes forever and also struggles with the model structure. To me this is the biggest challenge facing the application of bayesian NNâ€™s, and I dunno if itâ€™s been satisfactorily solved yet.

Unless Iâ€™m mistaken, but due to Numpyâ€™s constraints with stacking, this requires there be an equal number of samples in each group, right?

In reality, this is rarely the case. Iâ€™ve been thinking of getting around this problem using masked arrays, but before I try that I was wondering if anyone had any intuition of how PyMC3/Theano will handle masked arrays as the input (Iâ€™ve only ever seen examples of masked arrays being used as an observed variable).

Alternatively, Iâ€™d appreciate any other suggestions to get around this problem. Thanks!

Thanks @twiecki for the post!

I am also trying to get around this issue. Masked array wonâ€™t work here as some values from the observed variable (one group with a smaller size for example) will also need to be discarded. Is there a way to not take into account those values?

Does anyone have an idea? Thanks

Canâ€™t you just let `Xs`

and `Ys`

be lists and manage the activities and outputs as lists instead of tensors? i.e. change

`act_1 = pm.math.tanh(tt.batched_dot(Xs, weights_in_1))`

to

`act_1 = [pm.math.tanh(tt.dot(X[i], weights_in_1[i,:,:])]`

(etc) ?