Deep Neural Network Question

So I’ve gotten a few Bayesian neural networks to run. I haven’t been able to wrap my head around understanding how to make a layer wider. For example, a keras/tensor flow model may have a layer with 256 nodes, and another with 128, 64, etc.

How does that compare, in terms of coding using pymc3? Or does that matter when using Bayesian methods?

Just change the size of the weights (W) for each layer, such as:

with pm.Model() as two_layer:
  W1 = pm.Normal('l1_weight', 0., 1e-3., shape=(256, input_size))
  W2 = pm.Normal('l2_weight', 0., 1e-3, shape=(128, 256))
  L1 = pm.Deterministic('Layer1', tt.nnet.relu(tt.dot(W1, input_layer)))
  L2 = pm.Deterministic('Lyaer2', tt.nnet.relu(tt.dot(W2, L1)))
  ...

Edit: originally I mis-specified the keyword shape= as size=

1 Like

Once again @chartl to my rescue! Thank you for the help. Could I bother you with one more question? What is the difference between

trace = pm.sample(draws=15000, init=‘advi’, progressbar=True)

and

with neual_network:
inference = pm.ADVI()
approx = pm.fit(n=50000, method=inference)
trace = approx.sample(draws=5000)

What is actually happening in the background with the two?

Well I thought this worked. Now it’s saying this:

TypeError: init() got an unexpected keyword argument ‘size’

I’m trying this on the first weight.

n_hidden = 10

Initialize random weights between each layer

init_1 = np.random.randn(X_train.shape[1], n_hidden)
init_2 = np.random.randn(n_hidden, n_hidden)
init_3 = np.random.randn(n_hidden, n_hidden)
init_4 = np.random.randn(n_hidden, n_hidden)
init_5 = np.random.randn(n_hidden, n_hidden)
init_6 = np.random.randn(n_hidden, n_hidden)
init_7 = np.random.randn(n_hidden, n_hidden)
init_8 = np.random.randn(n_hidden, n_hidden)
init_9 = np.random.randn(n_hidden, n_hidden)
init_10 = np.random.randn(n_hidden, n_hidden)
init_out = np.random.randn(n_hidden)

with pm.Model() as neual_network:
# Weights from input to hidden layer
weights_in_1 = pm.Normal(‘w_in_1’, 0, sd=1,
shape = (X_train.shape[1], n_hidden),
size = (128, X_train.shape[1]),
testval=init_1)

# Weights from 1st to 2nd layer
weights_1_2 = pm.Normal('w_1_2', 0, sd=1, 
                        shape = (128, n_hidden), 
                        testval=init_2)

    # Weights from 1st to 2nd layer
weights_2_3 = pm.Normal('w_2_3', 0, sd=1, 
                        shape=(64, 128), 
                        testval=init_3)

    # Weights from 1st to 2nd layer
weights_3_4 = pm.Normal('w_3_4', 0, sd=1, 
                        shape=(64, 64), 
                        testval=init_4)

        # Weights from 1st to 2nd layer
weights_4_5 = pm.Normal('w_4_5', 0, sd=1, 
                        shape=(32, 64), 
                        testval=init_5)

            # Weights from 1st to 2nd layer
weights_5_6 = pm.Normal('w_5_6', 0, sd=1, 
                        shape=(32, 32), 
                        testval=init_6)

                # Weights from 1st to 2nd layer
weights_6_7 = pm.Normal('w_6_7', 0, sd=1, 
                        shape=(32, 16), 
                        testval=init_7)

                # Weights from 1st to 2nd layer
weights_7_8 = pm.Normal('w_7_8', 0, sd=1, 
                        shape=(16, 16), 
                        testval=init_8)

                    # Weights from 1st to 2nd layer
weights_8_9 = pm.Normal('w_8_9', 0, sd=1, 
                        shape=(8, 16), 
                        testval=init_9)

                        # Weights from 1st to 2nd layer
weights_9_10 = pm.Normal('w_9_10', 0, sd=1, 
                        shape=(1, ), 
                        testval=init_10)

# Weights from hidden layer to output
#weights_10_out = pm.Normal('w_10_out', 0, sd=1, 
#                          shape=(n_hidden,), 
#                          testval=init_out)

# Build neural-network using relu activation function
B2 = pm.Normal('bias2', 0., 1.)

act_1 = T.nnet.relu(T.dot(ann_input, weights_in_1))

act_2 = T.nnet.relu(T.dot(act_1, weights_1_2))

act_3 = T.nnet.relu(T.dot(act_2, weights_2_3) + B2)

act_4 = T.nnet.relu(T.dot(act_3, weights_3_4))

act_5 = T.nnet.relu(T.dot(act_4, weights_4_5) + B2)

act_6 = T.nnet.relu(T.dot(act_5, weights_5_6))

act_7 = T.nnet.relu(T.dot(act_6, weights_6_7) + B2)

act_8 = T.nnet.relu(T.dot(act_7, weights_7_8))

#act_9 = T.nnet.relu(T.dot(act_8, weights_9_10) + B2) 

act_out = T.dot(act_8, weights_9_10)


out = pm.Normal('out', mu = act_out, observed=ann_output, shape = y_train_t.shape)

The former runs the NUTS sampler, but with an initial matrix initialized by ADVI. The latter just runs ADVI.

That might actually be in my example, it looks like you’ve got a size= somewhere there should be a shape=. In fact weights_in_1 has both a size= and a shape=.