Hi folks
I built this model to attempt to identify the mnist dataset via a pymc model; very similar to Bayesian Deep Learning Part II: Bridging PyMC3 and Lasagne to build a Hierarchical Neural Network — While My MCMC Gently Samples. It’s more of an exercise, so I’m not trying to maximize the model score. I’m utilizing the mnist pypi package to load all the images and stuff. I do a bit of work to make the model input a (train_set_size, 28x28) shaped tensor, but besides that I don’t do much data transforming.
My model is the following
with pm.Model() as model:
input_layer = pm.MutableData("Input data", xtr_collapsed)
# half normal is dumb here, testing if it fixes the bad sampling.
sigma = 10
w1 = pm.HalfNormal(name="weights_L1", sigma=sigma, shape=(operator.mul(*xtr.shape[1:]), 10))
b1 = pm.HalfNormal(name="bias_L1", sigma=sigma)
w2 = pm.HalfNormal(name="weights_L2", sigma=sigma, shape=(10, 10))
b2 = pm.HalfNormal(name="bials_L2", sigma=sigma)
d1 = pm.math.dot(input_layer, w1) + b1
d1_act = tt.nnet.relu(d1)
d2 = pm.math.dot(d1_act, w2) + b2
d2_act = pm.Deterministic("prob_vector", tt.nnet.softmax(d2))
out = pm.Categorical("output_image", p=d2_act, shape=(10,), observed=ytr)
trace = pm.sample(1000)
However, this will almost reliably never work because the model fails to evaluating the starting points.
SamplingError: Initial evaluation of model at starting point failed!
Starting values:
{'weights_L1_log__': array([[1.37482088, 1.73278893, 2.99339127, ..., 3.19726469, 2.48872794,
1.61221258],
[2.26450808, 1.61254952, 2.66221272, ..., 2.48100945, 1.88426663,
3.19736027],
[2.7489012 , 1.76085018, 2.90702417, ..., 2.53959523, 2.67068885,
2.38506311],
...,
[3.05376546, 3.25595545, 2.90500401, ..., 1.76518558, 1.97759702,
2.79381552],
[2.11998757, 2.60491563, 1.62591751, ..., 2.97126484, 1.37402504,
1.42363439],
[3.05235059, 2.14056238, 1.56912392, ..., 1.82148841, 2.6710015 ,
2.95111808]]), 'bias_L1_log__': array(3.09719118), 'weights_L2_log__': array([[2.24883723, 1.63096953, 2.1467081 , 3.05354696, 1.42033742,
1.68394757, 2.8135108 , 1.93633338, 1.45380023, 1.86627278],
[2.60919593, 3.12426194, 2.78506874, 2.98095268, 2.71895375,
2.83157437, 2.25913873, 1.81027324, 2.01794709, 1.87127218],
[3.05540505, 1.37352505, 2.91633462, 2.22470959, 3.0643761 ,
1.66476466, 2.51197866, 1.81694154, 2.63089558, 2.21638313],
[1.34984191, 2.490901 , 1.5030485 , 3.00968011, 1.91147255,
2.20350087, 1.98295273, 3.25073676, 2.09514283, 2.17073368],
[2.50043393, 2.14080139, 1.63344651, 1.30433951, 1.65032382,
2.6382024 , 1.60630552, 2.10396354, 2.34055179, 3.07468549],
[2.71416466, 1.91966414, 1.44767031, 3.02037052, 1.62546005,
2.14546952, 1.57661002, 3.11709261, 2.83912894, 3.12258633],
[1.37028274, 3.14157269, 2.33021313, 2.00895707, 2.71446165,
1.88357063, 1.3826265 , 2.47217179, 2.82311552, 1.88598325],
[2.24698109, 3.07370228, 2.72725184, 1.97487037, 1.48434121,
3.11197024, 1.52367014, 2.74767257, 2.53206642, 2.83946842],
[1.99620709, 2.87543325, 2.48868871, 1.97008951, 2.02698368,
1.79207962, 2.49672451, 1.92625075, 2.23352972, 2.46698243],
[2.79665175, 2.23985067, 2.66792614, 2.49033473, 1.98972399,
2.41311927, 1.69043013, 2.09696587, 2.01460189, 2.97104148]]), 'bials_L2_log__': array(2.40026694)}
Initial evaluation results:
{'weights_L1': -8859.84, 'bias_L1': -1.88, 'weights_L2': -105.45, 'bials_L2': -0.74, 'output_image': -inf}
I’m reasonably confident the issue is in the output_image RV; and looking at the probability vector generated by the softmax call in aesara. I see some others are also talking about some numeric instability with theano but i’m not super sure.
I tried to instead do the logexpsum trick by defining a new softmax method
def log_softmax(p_before_softmax, T=tt):
softmax = p_before_softmax - T.log(T.sum(T.exp(p_before_softmax),axis=1,keepdims=True))
return softmax
which I can then pass in as logit_p
to the categorical stochastic variable, but I still see the problem. Is this indeed an issue with some bizarre numeric instability? Or my model poorly defined?