I am trying to use a Bayesian Neural Network (BNN) to model the stress-strain response of a group of materials. The training data consists of a Xnorm (= \lambda_1, inputs) and Ynorm (= \sigma_1, outputs). Xnorm varies from -0.5 to 0.5, and, for each Xnorm, Ynorm varies with some distribution that is dependent on the value of Xnorm. I have a plot of my training data below for more information.
I then build a BNN using PyMC3 in which I:
- Assume the weights and biases in the network are normally distributed
- Sample the means and biases of the neural network from uniform distributions over some (hopefully) reasonable ranges
- Forward propagate the neural network in the usual manner using tanh activation functions
- Sample the final output from a normal distribution whose mean is the output of the BNN and whose standard deviation I pick from a HalfNormal distribution.
When I then fit the model using ADVI and sample from the posterior of the model, I see that the model captures the mean response pretty well, but for some reason it can’t capture the variance in the training data. The variance in the training data reaches 0.14 at Xnorm = -0.5 and ~0.01 at Xnorm = 0.1, but the variance in the outputs of the model are almost constant everywhere and are very different from these.
I have been stuck for a long time now and can’t move forward no matter what I try, so I would highly appreciate any ideas, suggestions, criticisms etc. no matter how small they may be.
Here is how I build my model:
ann_input = theano.shared(Xnorm) ann_output = theano.shared(Ynorm) n_hidden = 3 # Initialize random weights between each layer init_1 = np.random.randn(1, n_hidden).astype(theano.config.floatX) ... init_out = np.random.randn(n_hidden, 1).astype(theano.config.floatX) ml = -2.5 # Mean Lower bound mu = +2.5 # Mean Upper bound sdl = 0 # Standard deviation lower bound sdu = 5 # Standard deviation upper bound with pm.Model() as neural_network: # Means and standard deviations of the weights in the neural network # w1m == weight 1 mean, w1sd == weight 1 standard deviation... w1m = pm.Uniform('w1m', lower=ml, upper=mu, shape=(1, n_hidden)) w1sd = pm.Uniform('w1sd', lower=sdl, upper=sdu, shape=(1, n_hidden)) ... wom = pm.Uniform('wom', lower=ml, upper=mu, shape=(n_hidden, 1)) wosd = pm.Uniform('wosd', lower=sdl, upper=sdu, shape=(n_hidden, 1)) # Means and standard deviations of the biases in the neural network # b1m == bias 1 mean, b1sd == bias 1 standard deviation... b1m = pm.Uniform('b1m', lower=ml, upper=mu, shape = n_hidden) b1sd = pm.Uniform('b1sd', lower=sdl, upper=sdu, shape = n_hidden) ... bom = pm.Uniform('bom', lower=ml, upper=mu) bosd = pm.Uniform('bosd', lower=sdl, upper=sdu) #Weights and biases sampled from normal distributions with means and standard deviations defined above weights_1 = pm.Normal('l1_w', mu=w1m, sd=w1sd, shape=(1, n_hidden), testval=init_1) ... weights_o = pm.Normal('o_w', mu=wom, sd=wosd, shape=(n_hidden, 1), testval=init_out) bias_1 = pm.Normal('l1_b', mu=b1m, sd=b1sd, shape = n_hidden) ... bias_o = pm.Normal('o_b', mu=bom, sd=bosd) # Now assemble the neural network layer_1 = pm.math.tanh(pm.math.dot(ann_input, weights_1) + bias_1) layer_2 = pm.math.tanh(pm.math.dot(layer_1, weights_2) + bias_2) layer_3 = pm.math.tanh(pm.math.dot(layer_2, weights_3) + bias_3) layer_4 = pm.math.tanh(pm.math.dot(layer_3, weights_4) + bias_4) layer_o = pm.math.dot(layer_4, weights_o) + bias_o sigma = pm.HalfNormal('sigma',sd=0.1) y = pm.Normal('y', layer_o, sd=sigma, observed = ann_output)
Then I fit the model using ADVI:
with neural_network: n_samples = 150000 inference = pm.ADVI() approx = pm.fit(n = n_samples, method = inference)
with neural_network: trace = approx.sample(draws=1000) ann_input.set_value(X_test) samples = pm.sample_posterior_predictive(trace, var_names=['y','sigma'], model=neural_network, size=1000)
Here is what the neural network predictions look like given the training data shown below:
As you can see the mean response is similar, but the variance in the NN output is nowhere near that of the data used to train it.
Again, as I said, I am desperate for any and all suggestions, so please don’t hesitate to comment your opinions. Here are some things that I have tried so far:
- Use ReLU activation functions instead of tanh
- Use other optimization methods such as NUTS
- Sample the weights and biases from Normal distributions with fixed means and standard deviations rather than sampling the means and SDs from Uniform distributions.
- Use smaller and larger networks; fewer nodes per layer or fewer layers etc. Reducing the network size only makes the output coarser, but the overall behavior is similar.
- Note that the solution does converge. Loss stops decreasing and I have tried running the fitting for much longer iterations as well as using ADVI.refine, but the results don’t improve.
- I have also tried manually playing with the weights and biases to see if it is even possible to get this shape of results with this neural network architecture, and it seems that yes it is possible given the right weight and bias distributions.
Thank you for your time and thoughts.
Edit: Note that in the top right plot “Likelihood” refers to the value of
sigma (i.e. the standard deviation of the “y” in the model) exactly, whereas I calculate the “Total” variance by taking the standard deviation of all the samples of “y” at every point.
Edit 2: I uploaded a better looking predictions plot.