Bayesian Neural Network captures the mean response but not the variance in the training data

Part of the issue might be in how you’re predicting y. Note that when you run sample_posterior_predictive to give you y, the resulting variance includes the noise term (sigma), which is why the mean +/- 2SD plot looks like there is little input-dependence to the uncertainty. Rather than sample y directly, I think you’ll want to set layer_o to be a pm.Deterministic named variable, then sample/plot that. That will give you (small, ~0.0005) input-dependent uncertainty.

More generally, though, I think you might be mistakenly conflating the Bayesian concept of confidence with the idea of observed variance. Look at what the likelihood term is doing. Your BNN is only trying to predict the mean of the data, so the resulting uncertainty reflects the model’s confidence in the mean, not the expected spread in observed values. You haven’t told the model that variance depends on inputs, so it doesn’t know to look for it. Keep in mind, too, that it looks like you have several traces there, each with one reading at each λ1. Note that your model doesn’t know that, it’s treating each individual reading from a given λ1 as completely independent of those at a neighboring λ1. Given that your BNN lacks any convolutional structure, and that your observations are just lumped together in the likelihood, the fact that neighboring measurements are correlated is completely lost.

You could specify a heteroskedastic model, where sigma depends on inputs. This could simply be a separate/parallel BNN. You could also try to get clever and replace every Normal prior with an MvNormal prior that outputs a vector of values, one element for each of mean and variance. You could also go with an entirely separate approach, such as a (likely Sparse) heteroskedastic Gaussian Process, which is basically doing the same thing (but arguably simpler than a BNN). The predictions of your model will give you the expected mean and expected spread in the data, just like a vanilla NN, but because it’s Bayesian you will also have an estimate of your confidence in that mean and confidence in that spread.

I suspect, though, that that’s not quite what you want. Is there any information on what distinguishes the different traces, or is the spread truly random? If you have anything, you should include that as a coordinate/covariate in your model. You could fit a (homoskedastic) 2D GP or BNN, you could use Hierarchical modeling (see here or here), you could use a coregionalized GP, etc. Using one of these approaches, when asking your model for a prediction, you would specify this extra variable and receive an expected mean with corresponding confidnce. I suspect that’s more what you’re looking for.

3 Likes