Limit or prevent unrealistic output of neural network

I was trying to use one hidden layer neural network model to predict the observing data rainfall (target value, the red dots), which shouldn’t be negative. This result of each day contains 1000 predictions, which was calculated as probability density and shown with the different shades of blue (It’s similar to the posterior distribution, but all put together) (green dot is one of the inputs) I also tried with two layer model and several different range of mu and sdt for weights in the hidden layer but it didn’t change too much the results.

Is there any methods to limited the results not to be negative? Any information is appreciated. Thanks a lot.

Model information
structure: 7-10-1 (inputs - nodes in hidden layer - output)
w_in_1_mu, sd 0, 2
w_1_out_mu,sd 0, 2
train_samp 20000
train_tune 1000

You can try modeling the observed using a distribution defined in (0, inf). For example, HalfNormal, logNormal.

Thanks a lot for your reply! I tried

with pm.Model() as neural_network:                        
  weights_in_1 = pm.Uniform('w_in_1', -1, 1,     
                            shape=(X.shape[1], n_hidden1),

  weights_1_out = pm.Normal('w_1_out', w_1_out_mu, sd = w_1_out_sd,

  act_1 = pm.math.tanh(, weights_in_1 ))
  regression =, weights_1_out)    #   T.mean(x)>>  mean function in Theano     

   out = pm.Normal('out', mu=regression, sd=np.sqrt( 0.9 ), observed=ann_output)

but got error message

ValueError: Bad initial energy: nan. The model might be misspecified.

Then I tried with [ -10000,10000 ], and it works.
I am quite curious about…is it a good way to figure out how to assign a prior? (Also the bound, I am not sure when I just try and see if it works…) Many papers and articles say the prior is how much information you have for your data before observing them, but in this case I don’t really know…

  1. Is it possible to assign different mu and sd to the nodes in the same hidden layer with the function provided? Because now in the hidden layer (see below), all weights in the first hidden layer all has mu = 0 and sd =1.
  weights_in_1 = pm.Normal('w_in_1', mu = 0, sd = 1,     
                            shape=(X.shape[1], n_hidden1),

Thanks again for any suggestion.

  1. If you try printing the logp from the model for all the nodes and there is no inf or nan (see eg Getting ‘Bad initial energy: inf’ when trying to sample simple model), but when you sample using the default trace = pm.sample(1000) it throws an error before the first sample, it is quite likely that the jitter in the default initialization jitter+adapt_diag makes some of the input invalids. Currently you can either set the init='adapt_diag' or init=None. We are in the process of making it more robust.

  2. Yes, you can pass an array to mu and sd, eg:

mu0 = np.random.randn(n_hidden1)
sd0 = np.random.rand(n_hidden1)*2.
weights_in_1 = pm.Normal('w_in_1', mu = mu0, sd = sd0,     
                           shape=(X.shape[1], n_hidden1),
1 Like

Thanks lot for the reply! :slight_smile:

Is it possible to use different prior distribution for different weights in neural network, instead of just different parameters? Thanks a lot. :slight_smile:

Yes, you can put hyperprior on the mu, for example:

mu0 = pm.Normal('hyper_mu', 0., 10., shape=n_hidden1)
weights_in_1 = pm.Normal('w_in_1', mu = mu0, sd = 1.,     
                           shape=(X.shape[1], n_hidden1),
1 Like