Reinfocement learning model - derivative of RV is zero



I’ve been trying to implement a reinforcement learning model in PyMC3 and am running into a “Mass matrix contains zeros on the diagonal” error when sampling (thanks by the way to @Maria - this post helped a lot Modeling reinforcement learning of human participant using PyMC3). I’ve not had any such trouble with simpler variants of the same model family.

A notebook with all the code etc is here and example data is here example_data.csv (222.6 KB).

From what I’ve seen in previous posts, common solutions to this problem are:

  1. Making priors less vague - I’ve tried drastically reducing the variance on my priors and this doesn’t seem to have any effect.
  2. Avoiding overflows - as far as I can see this shouldn’t be a problem given the parameterisation of the model.

Is there any other obvious cause of this problem that I could be missing?




Having been dealing with obtuse models in the last few months, "Mass matrix contains zeros on the diagonal", or some "derivative of RV is zero" error are usually the worst errors you can see, and are only one step away from absolute chain failures (which halt your whole model). I solved them by following these steps.

  1. Start with the vaguest priors first, uniform, and keep narrowing them down (centered around 0) little by little until the model reaches some stability (e.g., it runs all the way, or stops giving the above errors).
  2. Once it can run, let the model run completely, and then investigate the trace - using pm.traceplot() or anything else - and use that to decide on justifiably even tighter priors to improve one’s model further.

What kind of priors have you used, if you say you’ve already narrowed them down? I feel you might be able to narrow them down further. If not, then you can also

  1. reparametrize the model, or
  2. use different init options for your Pymc3 sampler,

which have also helped me.



Thanks for your help. Unfortunately there doesn’t seem to be much I can do with the priors to make it run. The parameter I’m estimating has to take a value between 0 and 1, so I’ve used a beta distribution. Regardless of whether I set its parameters to 1 (i.e. completely flat), or ~50 (centered very sharply on 0.5), the same problem occurs.

I’ve also tried using different init options as I’d read elsewhere that jitter can cause problems, however this also doesn’t seem to help.

As far as I can see that just leaves me with the option of reparameterising the model, but I’m not sure of the best way to go about this!



The model in the code itself looks quite good, though I suspect that there is something subtly wrong. Playing around a little bit with the prior:

alpha_err = pm.Normal('alpha_err', 0, 1., shape=(15,))
alpha = pm.Deterministic('alpha', 0.5 + 0.01 * alpha_err)

leads to a model that actually diverges with ADVI, and under NUTS gives

Traceback (most recent call last):
  File "scan_perform.pyx", line 397, in theano.scan_module.scan_perform.perform
IndexError: index 4 is out of bounds for size 4

(could it be that choice = prob.shape[0] - T.sum(rand <= cumsum, axis=0) can lead to a bad index?)

An alternative might be to unwrap the scan into the full Theano graph, something like:

def E_(i, k):
    u = tt.zeros(k)
    return tt.set_subtensor(u[i], 1)

T_ = 8
K_ = 3
with pm.Model() as banditry:
    sam_Qs = list()
    samp_alphas = list()
    for sample_idx in range(K_):
        a = pm.Beta('alpha_{}'.format(sample_idx), 10, 10)
        Q = [pm.Normal('Q_{}_prior'.format(sample_idx), mu=1., sd=0.1, shape=(4,))]
        for t in range(T_):
            choice = pm.Categorical('choice_{}_{}'.format(sample_idx, t), tt.exp(2*Q[t])) #, observed=responses[t,0,sample_idx]
            pe = pm.Deterministic('pe_{}_{}'.format(sample_idx, t), 
                                 (state_rewards[choice] - state_costs[choice] - Q[t][choice]) * E_(choice, 4))
            Qnew = pm.Deterministic('Q_{}_{}'.format(sample_idx, t), Q[t] + pe * a)
    pps = pm.sample_prior_predictive(1000)
    #trace = pm.sample(500)

but oddly enough I’m getting an error that the inputs can’t be tracked ?

ValueError: Cannot resolve inputs for ['Q_0_0', 'pe_0_1', 'Q_0_1', 'pe_0_2', 'Q_0_2', 'pe_0_3', 'Q_0_3', 'pe_0_4', 'Q_0_4', 'pe_0_5', 'Q_0_5', 'pe_0_6', 'Q_0_6', 'pe_0_7', 'Q_0_7', 'Q_1_0', 'pe_1_1', 'Q_1_1', 'pe_1_2', 'Q_1_2', 'pe_1_3', 'Q_1_3', 'pe_1_4', 'Q_1_4', 'pe_1_5', 'Q_1_5', 'pe_1_6', 'Q_1_6', 'pe_1_7', 'Q_1_7', 'Q_2_0', 'pe_2_1', 'Q_2_1', 'pe_2_2', 'Q_2_2', 'pe_2_3', 'Q_2_3', 'pe_2_4', 'Q_2_4', 'pe_2_5', 'Q_2_5', 'pe_2_6', 'Q_2_6', 'pe_2_7', 'Q_2_7']

so perhaps this approach won’t work (but it should)



Based on what chartl said, I would suspect something is wrong with your code, or you found an edge case in Pymc3 (which is certainly possible). I would recommend playing around with your code as much as you can till you localize your error?



Thanks both!

@chartl I think the problem with the first approach giving index errors is due to to alpha going out of bounds, which in turn leads to NaNs. This line

choice = prob.shape[0] - T.sum(rand <= cumsum, axis=0)

Then gives a choice of 4 because cumsum (which is now NaN) is never greater than the random number (so we get a shape of 4 minus 0).

I tried switching the Normal to a bounded Normal to prevent alpha going out of bounds using the following:

BoundedNormal = pm.Bound(pm.Normal, lower=-0.5, upper=0.5)
alpha_err = BoundedNormal('alpha_err', mu=1.0, sd=1, shape=(15,))

This doesn’t produce any index errors, however I still get the same mass matrix error.

I’ll have a go at unwrapping the scan loop - it seems to me like there must be a simpler solution though, especially as similar models seem to work fine (e.g. Modeling reinforcement learning of human participant using PyMC3).

@Gon_F My feeling is that there must be something wrong with the code - I’ve done a lot of playing around though and can’t seem to find anything!