Optimization failure when running model

I am trying to implement the following model, which is intended to represent an elections cycle. Thus, the Dirichlet models the expected support of various parties on election day, and then the gaussian walk models the changes during the cycle counting back from election day.

But I am getting an optimization failure -

ERROR (theano.gof.opt): Optimization failure due to: graph_merge_softmax_with_crossentropy_softmax
ERROR (theano.gof.opt): node: SoftmaxWithBias(w, v1)
ERROR (theano.gof.opt): TRACEBACK:
ERROR (theano.gof.opt): Traceback (most recent call last):
  File "C:\Users\yitzhak.sapir\AppData\Local\Continuum\miniconda3\envs\pymc3-theano\lib\site-packages\theano\gof\opt.py", line 2034, in process_node
    replacements = lopt.transform(node)
  File "C:\Users\yitzhak.sapir\AppData\Local\Continuum\miniconda3\envs\pymc3-theano\lib\site-packages\theano\tensor\nnet\nnet.py", line 1937, in graph_merge_softmax_with_crossentropy_softmax
    if x_client[0].op == crossentropy_softmax_argmax_1hot_with_bias:
AttributeError: 'str' object has no attribute 'op'

The following is a minimal code that reproduces the issue with comments explaining their purpose:

with pm.Model() as poll_model:
  # dirichlet model of support
  v  = pm.Dirichlet('v', np.ones(3), shape=3, testval=[0.2, 0.3, 0.7])
  # transform to log-ratio 
  v1 = pm.Deterministic('v1', T.log(v[0:-1]/v[-1]))
  # model walk in log-ratio space
  lkj = pm.LKJCholeskyCov('lkj', eta=50, n=2, sd_dist=pm.HalfCauchy.dist(2.5))
  chol = pm.expand_packed_triangular(2, lkj)
  w_ = pm.MvGaussianRandomWalk('w_', chol=chol, shape=[4, 2])
  # recover supports including walk
  w = pm.Deterministic('w', w_ - w_[0]) #.reshape((1, w_.shape[1])).repeat(4, axis=0))
  ea = pm.Deterministic('ea', T.exp(v1 + w))
  # normalize (intentionally leaving out the base used for log-ratio)
  mu = pm.Deterministic('mu', ea / ea.sum(axis=1).reshape((ea.shape[0],1)))
  # model observed polls
  x = pm.MvNormal('p', chol=chol, mu=mu, observed = [
                    [ 0.1, 0.9 ],
                    [ 0.2, 0.8 ],
                    [ 0.4, 0.6 ],
                    [ 0.8, 0.2 ]
                ])
        
  samples = pm.sample(10, n_init=8, njobs=1, init='advi')

I thought it might be my subtraction w_ - w_[0] but even without it, I get the failure. In any case, it is important to me that the gaussian walk start at exactly 0.
My model also runs very slowly on my machine, but first I’d like to complete the model before tackling that.

Hmmm there might be some theano optimization error - rescaling and centering latent variable usually make inference very difficult because your model becomes unidentifiable.

Here the problem is

mu = pm.Deterministic('mu', ea / ea.sum(axis=1).reshape((ea.shape[0],1)))

havent find a way to fix it yet tho.

Thank you for the response.

I managed to resolve it using:

        ea_sum = T.repeat(ea.sum(axis=1), num_parties, axis=0).reshape([num_days,num_parties])

I’m guessing the “str” in the error message may possibly refer to “x” in a dimshuffle argument.
As for the centering/rescaling, if that is referring to the division by the sum, I was basing myself on the log-ratio transform suggested here (with code in Stan):
http://www.marcel-neunhoeffer.com/publication/pa_forecast-multiparty/

I’m trying now to consider other methods to represent the progress through the elections cycle.

1 Like