I see, so you are saying that the variation in the “mean” predictions themselves should be accounting for the variation in the data. I am not sure if this follows directly from the model (ie if it’s flexible enough for this). Given that you opened this issue in the first place I would say it’s not.
I should warn you that I am not very familiar with Bayesian NNs!
Anyway to model the noise explicitly you can do it just as you did for your mean outputs, i.e., make it depend on a separate set of NN activations. You can use an exponential link to make predictions from the real domain to the positive domain.
You can probably also make this depend just on the output layer of your NN for computational savings. Or you can go even more simple and just use a vanilla GLM based on your NN outputs for the noise component.
Let me know if I sound completely off!