I’m doing a binary neural network training (weights constrained to be either 0 or 1). My model is
In JAGS it looks like
model {
for (i in 1:length(y)) {
y[i] ~ dbern(p[i])
logit(p[i]) = z[i,2] - z[i,1]
z[i,1:2] = w %*% x[i,]
}
for (i_output in 1:2) {
for (j_input in 1:25) {
w[i_output, j_input] ~ dbern(0.5)
}
}
}
I started simple: a vector of 25 binary values (MNIST 5x5) is fed into a network with binary weights and binary outputs - label 0 or 1 is taken as argmax softmax of the activations W * x like so:
y_predicted = argmax{ softmax(W * x) }
When I finish with this toy example, I’ll go for a full MNIST 28x28 with a true softmax (not a logit).
I’d like to port this toy model in PyMC3. I’m struggling with matrix multiplication W * x. How to implement such a model?
What I expect to see in PyMC3 is something like this:
data = pd.read_csv("https://www.dropbox.com/s/l7uppxi1wvfj45z/MNIST56_train.csv?dl=1")
x = data.loc[:, :25] # MNIST 5x5 flatten
y = data.loc[:, 25] # label 0 or 1
basic_model = pm.Model() # am I right here?
with basic_model:
# Priors for unknown model parameters
w = pm.Bernoulli('w', p=0.5)
logit_p = w %*% x
y_obs = pm.Bernoulli('y_obs', logit_p=logit_p, observed=y)
step = pm.Slice() # what sampler to use??
trace = pm.sample(1000, step=step, njobs=1, chains=3, n_init=1000)
