Reinfocement learning model - derivative of RV is zero

Thanks both!

@chartl I think the problem with the first approach giving index errors is due to to alpha going out of bounds, which in turn leads to NaNs. This line

choice = prob.shape[0] - T.sum(rand <= cumsum, axis=0)

Then gives a choice of 4 because cumsum (which is now NaN) is never greater than the random number (so we get a shape of 4 minus 0).

I tried switching the Normal to a bounded Normal to prevent alpha going out of bounds using the following:

BoundedNormal = pm.Bound(pm.Normal, lower=-0.5, upper=0.5)
alpha_err = BoundedNormal('alpha_err', mu=1.0, sd=1, shape=(15,))

This doesn’t produce any index errors, however I still get the same mass matrix error.

I’ll have a go at unwrapping the scan loop - it seems to me like there must be a simpler solution though, especially as similar models seem to work fine (e.g. Modeling reinforcement learning of human participant using PyMC3).

@Gon_F My feeling is that there must be something wrong with the code - I’ve done a lot of playing around though and can’t seem to find anything!