I have been experiencing frustrating model specification “index out of bounds” error on specifying a Hierarchy logistic model.
I did follow the specification Junpenglao indicated on this question Prediction using sample_ppc in Hierarchical model - Questions - PyMC Discourse. However, it did not work at my case.
Here is my code:
X_train_hier, X_test_hier, y_train_hier, y_test_hier = train_test_split(X_hier, y, test_size=.15, random_state=seed)
X_idx = theano.shared(np.asarray(X_train_hier.uid))
X_user = len(np.unique(np.asarray(X.uid)))
X_train_hier = X_train_hier.drop(“uid”, axis=1)
X_len = len(X_train_hier.keys())
X_data = theano.shared(X_train_hier)
with pm.Model() as hier:
#hyperpriors:
mu_a = pm.Normal(“mu_a”, 0.0, 1e4)
sigma_a = pm.Exponential(“sigma_a”, 1e4)
mu_b = pm.Normal(“mu_b”, 0.0, 1e4)
sigma_b = pm.Exponential(“sigma_b”, 1e4)
#intercept prior for each user
a = pm.Normal("a", mu_a, sigma_a, shape=X_user)
#coefficient prior for each user
b = pm.Normal("b", mu_b, sigma_b, shape=(X_user, X_len))
#likelihood
theta = pm.Deterministic("theta", pm.invlogit(a[X_idx] + b[X_idx] * X_data.T))
#observation from sigmoid
obs = pm.Bernoulli(name="AT", p=theta, observed=y_train_hier)
And here is the error message I had:
IndexError Traceback (most recent call last)
in
14
15 #likelihood
—> 16 theta = pm.Deterministic(“theta”, pm.invlogit(b[X_idx] * X_data.T))
17
18 #observation from sigmoid
~\AppData\Roaming\Python\Python38\site-packages\theano\tensor\var.py in getitem(self, args)
572 # take
function/Op
serves exactly this type of indexing,
573 # so we simply return its result.
→ 574 return self.take(args[axis], axis)
575 else:
576 return theano.tensor.subtensor.advanced_subtensor(self, *args)
~\AppData\Roaming\Python\Python38\site-packages\theano\tensor\var.py in take(self, indices, axis, mode)
621
622 def take(self, indices, axis=None, mode=“raise”):
→ 623 return theano.tensor.subtensor.take(self, indices, axis, mode)
624
625 def copy(self, name=None):
~\AppData\Roaming\Python\Python38\site-packages\theano\tensor\subtensor.py in take(a, indices, axis, mode)
2522 return advanced_subtensor1(a.flatten(), indices)
2523 elif axis == 0:
→ 2524 return advanced_subtensor1(a, indices)
2525 else:
2526 if axis < 0:
~\AppData\Roaming\Python\Python38\site-packages\theano\graph\op.py in call(self, *inputs, **kwargs)
251
252 if config.compute_test_value != “off”:
→ 253 compute_test_value(node)
254
255 if self.default_output is not None:
~\AppData\Roaming\Python\Python38\site-packages\theano\graph\op.py in compute_test_value(node)
128 thunk.outputs = [storage_map[v] for v in node.outputs]
129
→ 130 required = thunk()
131 assert not required # We provided all inputs
132
~\AppData\Roaming\Python\Python38\site-packages\theano\graph\op.py in rval()
604
605 def rval():
→ 606 thunk()
607 for o in node.outputs:
608 compute_map[o][0] = True
~\AppData\Roaming\Python\Python38\site-packages\theano\link\c\basic.py in call(self)
1769 print(self.error_storage, file=sys.stderr)
1770 raise
→ 1771 raise exc_value.with_traceback(exc_trace)
1772
1773
IndexError: index 7420 is out of bounds for axis 0 with size 7420
I had played around with the model, the problem seemed to be at X_idx. However, I used the exact method to assign X_idx as Junpenglao did at Prediction using sample_ppc in Hierarchical model - Questions - PyMC Discourse. I tried the sample code from the above question and it worked, but on own dataset it does not.
I am sorry I cannot upload a part of my data because of confidentiality issue, however, I can provide the dimensions of X_idx (70345, which is the number of rows of this training dataset), X_user (7420, which is the number of rows of uid for the entire dataset).
I am stuck on this for days. I could really appreciate some help! @junpenglao