Error in Defining Bayesian NN with Multivariate Gaussians

Hi there and thanks for all your work,

We are trying to build a Bayesian Neural Network with Multivariate Gaussians as distributions for our parameters. We are following the structure of the code presented in the tutorial

with univariate Normal assuming a factorization of the posterior.

We counter problems with the incompatibility of dimensions. Here is our code:

def construct_nn(ann_input, ann_output):
    n_hidden = 5 #number of neurons in each hidden layer

    mu_in_1 = np.zeros((X_train.shape[1] * n_hidden), dtype=floatX)
    cov_in_1 = np.eye(X_train.shape[1] * n_hidden, dtype=floatX)

    mu_1_2 = np.zeros((n_hidden * n_hidden), dtype=floatX)
    cov_1_2 = np.eye(n_hidden * n_hidden, dtype=floatX)

    mu_2_out = np.zeros(n_hidden, dtype=floatX)
    cov_2_out = np.eye(n_hidden, dtype=floatX)

    coords = {
        "hidden_layer_1": np.arange(n_hidden),
        "hidden_layer_2": np.arange(n_hidden),
        "train_cols": np.arange(X_train.shape[1]),
        # "obs_id": np.arange(X_train.shape[0]),
    }

    with pm.Model(coords=coords) as neural_network:
        ann_input = pm.Data("ann_input", X_train, mutable=True, dims=("obs_id", "train_cols"))
        ann_output = pm.Data("ann_output", Y_train, mutable=True, dims="obs_id")
        
        # Weights from input to hidden layer 
        #The prior for the weights connecting the input to the first hidden layer is a normal distribution
        weights_in_1 = pm.MvNormal("w_in_1", mu=mu_in_1, cov=cov_in_1, dims=("train_cols", "hidden_layer_1"))
        # Weights from 1st to 2nd layer
        #The prior for the weights connecting the first hidden layer to the second is a normal distribution
        weights_1_2 = pm.MvNormal("w_1_2", mu=mu_1_2, cov=cov_1_2, shape=(n_hidden, n_hidden))

        #The prior for the weights connecting the second hidden layer to the output layer is a normal distribution
        # Weights from hidden layer to output
        weights_2_out = pm.MvNormal("w_2_out", mu=mu_2_out, cov=cov_2_out, dims="hidden_layer_2")

        # Build neural-network using tanh activation function
        act_1 = pm.math.tanh(pm.math.dot(ann_input, weights_in_1))
        act_2 = pm.math.tanh(pm.math.dot(act_1, weights_1_2))
        act_out = pm.math.sigmoid(pm.math.dot(act_2, weights_2_out))


        # Binary classification -> Bernoulli likelihood
        #The likelihood of the model is a Bernoulli distribution, it is used when calling pm.sample() or pm.fit()
        out = pm.Bernoulli(
            "out",
            act_out, #The probability of success for each trial (i.e., the probability of outputting a 1)
            observed=ann_output, #The observed data
            total_size=Y_train.shape[0],  # IMPORTANT for minibatches
            dims="obs_id",
        )
    return neural_network


neural_network = construct_nn(X_train, Y_train)

The Neural network gets defined but when we train it using this code, we face the following error:

%%time

with neural_network:
    approx = pm.fit(n=30_000) # approx represents the variational approximation to the posterior distribution

ValueError: Incompatible Elemwise input shapes [(None, 5), (None, 25)]

Your problems arise in here

 act_1 = pm.math.tanh(pm.math.dot(ann_input, weights_in_1))
 act_2 = pm.math.tanh(pm.math.dot(act_1, weights_1_2))
 act_out = pm.math.sigmoid(pm.math.dot(act_2, weights_2_out))

Shape problems are generally very easy to diagnose. Just break inside your code and do evaluations like

act_1.eval()
act_2.eval()
act_out.eval()

You will see that act_2 and act_out fails even though all the inputs like weights also .eval() fine. So there is then a problem with the dimensions of things you are trying to dot. act_1 is shape (500,10) where as weights_1_2 is (5,25). Cant dot them

1 Like