I’m using profiling on a large model to attempt to speed sampling.

```
func = model.logp_dlogp_function(profile=True)
func.set_extra_values({})
x = np.random.randn(func.size)
for _ in range(1000):
func(x)
print(func.profile.summary())
```

A segment of the output is below:

```
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
72.9% 72.9% 14.346s 4.78e-03s Py 3000 3 theano.tensor.subtensor.AdvancedIncSubtensor
8.1% 81.1% 1.602s 1.60e-03s C 1000 1 theano.tensor.nnet.nnet.Softmax
7.0% 88.1% 1.383s 3.74e-05s C 37000 37 theano.tensor.elemwise.Elemwise
4.7% 92.8% 0.919s 9.19e-04s C 1000 1 theano.tensor.subtensor.AdvancedIncSubtensor1
4.0% 96.8% 0.790s 2.63e-04s Py 3000 3 theano.tensor.subtensor.AdvancedSubtensor
```

Notice that super yucky 14.3 seconds/it command (theano.tensor.subtensor.AdvancedIncSubtensor) taking up 73% of computation time. Its also a py op.

Its comes from a single line in my model, a paired down version is below:

```
idx1 = shared(np.random.int(0,4,(20000,0)))
idx2 = shared(np.random.int(0,4,(20000,0)))
with pm.Model() as mod:
a_multi_dim_param = pm.Normal('a_multi_dim_param', 0, 1, shape=(4,4,4))
a_multi_dim_param[idx1, idx2] #AdvancedIncSubtensor
#....
```

Questions:

Why is this a py op? and is there anything that can be done to speed it up? a reshape maybe?