Binary neural network training

dizcza · June 8, 2018, 8:30am

I’m doing a binary neural network training (weights constrained to be either 0 or 1). My model is

In JAGS it looks like

model {
  for (i in 1:length(y)) {
    y[i] ~ dbern(p[i])
    logit(p[i]) = z[i,2] - z[i,1]
    z[i,1:2] = w %*% x[i,]
  }
  for (i_output in 1:2) {
    for (j_input in 1:25) {
      w[i_output, j_input] ~ dbern(0.5)
    }
  }
}

I started simple: a vector of 25 binary values (MNIST 5x5) is fed into a network with binary weights and binary outputs - label 0 or 1 is taken as argmax softmax of the activations W * x like so:
y_predicted = argmax{ softmax(W * x) }
When I finish with this toy example, I’ll go for a full MNIST 28x28 with a true softmax (not a logit).

I’d like to port this toy model in PyMC3. I’m struggling with matrix multiplication W * x. How to implement such a model?
What I expect to see in PyMC3 is something like this:

data = pd.read_csv("https://www.dropbox.com/s/l7uppxi1wvfj45z/MNIST56_train.csv?dl=1")
x = data.loc[:, :25]  # MNIST 5x5 flatten
y = data.loc[:, 25]  # label 0 or 1

basic_model = pm.Model()  # am I right here?

with basic_model:
    # Priors for unknown model parameters
    w = pm.Bernoulli('w', p=0.5)
    logit_p = w %*% x
    y_obs = pm.Bernoulli('y_obs', logit_p=logit_p, observed=y)

    step = pm.Slice()  # what sampler to use??
    trace = pm.sample(1000, step=step, njobs=1, chains=3, n_init=1000)

junpenglao · June 8, 2018, 9:24am

You need to make sure the shape is correct here, since x has shape of (n, 25), the weight should be (25, 1), something like below should work:

import theano.tensor as tt

with basic_model:
    w = pm.Bernoulli('w', p=0.5, shape=(25, 1))
    logit_p = tt.dot(x, w)
    ...

dizcza · June 8, 2018, 4:55pm

@junpenglao
Thanks, that does the job for a binary classifier. The complete and ready-to-run source code is here.

Now I want to switch to a multi-class problem, where the output of my network is a softmax that tells the probability of belonging to the particular class among 10. How to model such a layer? Something like Multivariate Bernoulli distribution, constrained to have only one bit activate among 10… I know this is not a PyMC-related question, though.

junpenglao · June 8, 2018, 5:33pm

you can extend the dimension of w so that the dot product returns a (n, k) matrix where k=10:

with basic_model:
    w = pm.Bernoulli('w', p=0.5, shape=(25, 10))
    p = tt.nnet.softmax(tt.dot(x, w))

There is a similar example here: Multinomial with Random Effects on panel data (shape issues)

dizcza · June 8, 2018, 7:47pm

Yes, Multinomial with n=1 seems to capture what I need. Yet sampling even 10 draws of 784x10 binary matrix on full 60000 MNIST digits data will take days. The idea was to take advantage of Bayesian techniques to optimize neural networks in a case when parameter space is small (constrained to be binary). This example shows that the direct approach is hopeless in terms of the training time, compared to traditional gradient-based optimizations.

Anyway, for those who are interested, the source is here.

junpenglao · June 8, 2018, 8:14pm

Yeah sampling from a large binary matrix is really slow in PyMC3 - you can try instead relax the requirement and sample from a matrix with value between 0 and 1.

Topic		Replies	Views
Basic usage problem within PyMC3 Questions development	6	673	September 5, 2020
Multi-Class Classifier in PyMC3 Questions	9	2700	February 22, 2023
How to model/deal with weighted binary outcomes Questions	11	2932	August 30, 2021
Learning Row-Wise Probabilities of 2D Binary Matrix v5	0	255	March 15, 2023
Imblance Binary Classification With pymc3 Questions	1	874	February 4, 2020

Binary neural network training

Related topics