I am trying to implement the following model, which is intended to represent an elections cycle. Thus, the Dirichlet models the expected support of various parties on election day, and then the gaussian walk models the changes during the cycle counting back from election day.
But I am getting an optimization failure -
ERROR (theano.gof.opt): Optimization failure due to: graph_merge_softmax_with_crossentropy_softmax
ERROR (theano.gof.opt): node: SoftmaxWithBias(w, v1)
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
File "C:\Users\yitzhak.sapir\AppData\Local\Continuum\miniconda3\envs\pymc3-theano\lib\site-packages\theano\gof\opt.py", line 2034, in process_node
replacements = lopt.transform(node)
File "C:\Users\yitzhak.sapir\AppData\Local\Continuum\miniconda3\envs\pymc3-theano\lib\site-packages\theano\tensor\nnet\nnet.py", line 1937, in graph_merge_softmax_with_crossentropy_softmax
if x_client[0].op == crossentropy_softmax_argmax_1hot_with_bias:
AttributeError: 'str' object has no attribute 'op'
The following is a minimal code that reproduces the issue with comments explaining their purpose:
with pm.Model() as poll_model:
# dirichlet model of support
v = pm.Dirichlet('v', np.ones(3), shape=3, testval=[0.2, 0.3, 0.7])
# transform to log-ratio
v1 = pm.Deterministic('v1', T.log(v[0:-1]/v[-1]))
# model walk in log-ratio space
lkj = pm.LKJCholeskyCov('lkj', eta=50, n=2, sd_dist=pm.HalfCauchy.dist(2.5))
chol = pm.expand_packed_triangular(2, lkj)
w_ = pm.MvGaussianRandomWalk('w_', chol=chol, shape=[4, 2])
# recover supports including walk
w = pm.Deterministic('w', w_ - w_[0]) #.reshape((1, w_.shape[1])).repeat(4, axis=0))
ea = pm.Deterministic('ea', T.exp(v1 + w))
# normalize (intentionally leaving out the base used for log-ratio)
mu = pm.Deterministic('mu', ea / ea.sum(axis=1).reshape((ea.shape[0],1)))
# model observed polls
x = pm.MvNormal('p', chol=chol, mu=mu, observed = [
[ 0.1, 0.9 ],
[ 0.2, 0.8 ],
[ 0.4, 0.6 ],
[ 0.8, 0.2 ]
])
samples = pm.sample(10, n_init=8, njobs=1, init='advi')
I thought it might be my subtraction w_ - w_[0] but even without it, I get the failure. In any case, it is important to me that the gaussian walk start at exactly 0.
My model also runs very slowly on my machine, but first I’d like to complete the model before tackling that.