Hierarchical Bayesian Neural Networks with Informative Priors by @twiecki

junpenglao · August 13, 2018, 2:27pm

New blog post by @twiecki

nmrobert · August 14, 2018, 12:03pm

Very cool stuff, as per usual

One thing I can’t help but notice is that the number of weights (n_hidden) is very small here - just 5 neurons per layer. When I played around with twiecki’s previous bayesian NN examples and did my own experiments, I ran into really severe issues with non-identifiability and multimodality in the posterior as you increase the number of neurons per layer.

ADVI fails badly as the number of weights increase for obvious reasons (both mode-seeking and mode-covering behaviour result in poor approximations), and NUTS takes forever and also struggles with the model structure. To me this is the biggest challenge facing the application of bayesian NN’s, and I dunno if it’s been satisfactorily solved yet.

bglick13 · November 9, 2018, 4:30pm

Unless I’m mistaken, but due to Numpy’s constraints with stacking, this requires there be an equal number of samples in each group, right?

In reality, this is rarely the case. I’ve been thinking of getting around this problem using masked arrays, but before I try that I was wondering if anyone had any intuition of how PyMC3/Theano will handle masked arrays as the input (I’ve only ever seen examples of masked arrays being used as an observed variable).

Alternatively, I’d appreciate any other suggestions to get around this problem. Thanks!

WolfRam · July 7, 2019, 9:09am

[quote=“bglick13, post:3, topic:1718”]
In reality, this is rarely the case. I’ve been thinking of getting around this problem using masked arrays, but before I try that I was wondering if anyone had any intuition of how PyMC3/Theano will handle masked arrays as the input (I’ve only ever seen examples of masked arrays being used as an observed variable).[/quote]

Thanks @twiecki for the post!

I am also trying to get around this issue. Masked array won’t work here as some values from the observed variable (one group with a smaller size for example) will also need to be discarded. Is there a way to not take into account those values?

Does anyone have an idea? Thanks

chartl · July 8, 2019, 9:10pm

Can’t you just let Xs and Ys be lists and manage the activities and outputs as lists instead of tensors? i.e. change

act_1 = pm.math.tanh(tt.batched_dot(Xs, weights_in_1))

to

act_1 = [pm.math.tanh(tt.dot(X[i], weights_in_1[i,:,:])]

(etc) ?

sahil-m · April 7, 2021, 7:55am

Any update on this i.e. how to handle dot product in case of hierarchical networks when there are NOT equal number of samples in each group, and thus groups can’t be stacked.

June · July 15, 2021, 12:10pm

Hi, sorry to come back to it after such a long time. But I was wondering if you’d be able to elaborate on this. e.g. what’s i, what does your etc entail?
Thanks

Topic		Replies	Views
Predicting Bayesion NN does not work with variational api Questions	0	543	April 23, 2019
Multi-dimensional input Bayesian Neural Network modeling	0	324	February 20, 2024
Error in Defining Bayesian NN with Multivariate Gaussians v5 development , doc , modeling	1	166	February 29, 2024
Input Dimension mismatch error when modelling biases inside the HBNN tutorial Questions	4	1372	February 27, 2019
Bayesian Backpropagation Development	1	1125	December 25, 2018

Hierarchical Bayesian Neural Networks with Informative Priors by @twiecki

Related topics