Thanks again for taking the time to respond.
I think your point on flexibility of the neural network is important, however I have checked and confirmed that the NN is indeed flexible enough for this. To check this I took the mean values of the weights and biases from the ADVI fit and manually tweaked them to see how the output changes. And, indeed, it turns out that the neural network is capable of capturing this behavior if it just figured out the right distributions for the weights and biases. Here is an image of what I did:
Notice how when I change b2 (bias 2) the output changes to reflect the variance at Xnorm=-0.5 . This is exactly what I expect from my fitting, but it’s just not happening.
I have thought about modeling the noise explicitly using a separate set of activations, but I think that defeats the purpose of using a Bayesian Neural Network in the first place. In that method, there is no longer a point in making the weights and biases random variables. I could just build two neural networks in a non-probabilistic package like Keras/Tensorflow and train one of them to return the mean response and the other to return the variance as a function of the input.
