The model in the code itself looks quite good, though I suspect that there is something subtly wrong. Playing around a little bit with the prior:
alpha_err = pm.Normal('alpha_err', 0, 1., shape=(15,))
alpha = pm.Deterministic('alpha', 0.5 + 0.01 * alpha_err)
leads to a model that actually diverges with ADVI, and under NUTS gives
Traceback (most recent call last):
File "scan_perform.pyx", line 397, in theano.scan_module.scan_perform.perform
IndexError: index 4 is out of bounds for size 4
(could it be that choice = prob.shape[0] - T.sum(rand <= cumsum, axis=0)
can lead to a bad index?)
An alternative might be to unwrap the scan
into the full Theano graph, something like:
def E_(i, k):
u = tt.zeros(k)
return tt.set_subtensor(u[i], 1)
T_ = 8
K_ = 3
with pm.Model() as banditry:
sam_Qs = list()
samp_alphas = list()
for sample_idx in range(K_):
a = pm.Beta('alpha_{}'.format(sample_idx), 10, 10)
Q = [pm.Normal('Q_{}_prior'.format(sample_idx), mu=1., sd=0.1, shape=(4,))]
for t in range(T_):
choice = pm.Categorical('choice_{}_{}'.format(sample_idx, t), tt.exp(2*Q[t])) #, observed=responses[t,0,sample_idx]
pe = pm.Deterministic('pe_{}_{}'.format(sample_idx, t),
(state_rewards[choice] - state_costs[choice] - Q[t][choice]) * E_(choice, 4))
Qnew = pm.Deterministic('Q_{}_{}'.format(sample_idx, t), Q[t] + pe * a)
Q.append(Qnew)
samp_alphas.append(a)
sam_Qs.append(Q)
pps = pm.sample_prior_predictive(1000)
#trace = pm.sample(500)
but oddly enough I’m getting an error that the inputs can’t be tracked ?
ValueError: Cannot resolve inputs for ['Q_0_0', 'pe_0_1', 'Q_0_1', 'pe_0_2', 'Q_0_2', 'pe_0_3', 'Q_0_3', 'pe_0_4', 'Q_0_4', 'pe_0_5', 'Q_0_5', 'pe_0_6', 'Q_0_6', 'pe_0_7', 'Q_0_7', 'Q_1_0', 'pe_1_1', 'Q_1_1', 'pe_1_2', 'Q_1_2', 'pe_1_3', 'Q_1_3', 'pe_1_4', 'Q_1_4', 'pe_1_5', 'Q_1_5', 'pe_1_6', 'Q_1_6', 'pe_1_7', 'Q_1_7', 'Q_2_0', 'pe_2_1', 'Q_2_1', 'pe_2_2', 'Q_2_2', 'pe_2_3', 'Q_2_3', 'pe_2_4', 'Q_2_4', 'pe_2_5', 'Q_2_5', 'pe_2_6', 'Q_2_6', 'pe_2_7', 'Q_2_7']
so perhaps this approach won’t work (but it should)