How to vectorize indexing of variable for pm.math.sum?

Hi all!
I’m working on a model with an ordered categorical predictor (E below). So, the associated parameter is incrementally added to the linear model: when E==0, delta_e = 0; when E==1, delta_e = 0 + delta_e1; when E==2, delta_e = 0 + delta_e1 + delta_e2, etc. In code:

with pm.Model() as m:
    kappa = pm.Normal(
        'kappa', 0., 1.5,
        transform=pm.distributions.transforms.ordered,
        shape=6, testval=np.arange(6) - 2.5)
    bA = pm.Normal('bA', 0., 1.)
    bE = pm.Normal('bE', 0., 1.)
    
    delta = pm.Dirichlet("delta", np.repeat(2., 7), shape=7)
    delta_j = tt.concatenate([tt.zeros(1), delta])
    
    phi = bE * pm.math.sum(delta_j[: E]) + bA * A

    resp_obs = pm.OrderedLogistic(
        'resp_obs', phi, kappa,
        observed=R
    )

This however yields a ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
A workaround is to use a list comprehension:

delta_sum = tt.as_tensor_variable([pm.math.sum(delta_j[: E[i] + 1]) for i in range(len(E))])
phi = bE * delta_sum + bA * A

But there are about 10_000 data points, so this yields an Exception: ('Compilation failed (return status=1).
Does someone see how to vectorize that operation (or another workaround)?

Thanks a lot in advance, and stay safe!

2 Likes

Try computing the cumsum first, and index to the resulting cumsum matrix.

2 Likes

Thanks Junpeng!
Not sure I understand though: isn’t that what I’m doing with:

delta_sum = tt.as_tensor_variable([pm.math.sum(delta_j[: E[i] + 1]) for i in range(len(E))])
phi = bE * delta_sum + bA * A

?

Would this work?

delta = pm.Dirichlet("delta", np.repeat(2., 7), shape=7)
delta_j = tt.concatenate([tt.zeros(1), delta])
delta_j_cumulative = tt.cumsum(delta_j)

phi = bE * delta_j_cumulative[E+1] + bA * A
2 Likes

Oh, you’re right, that’s the meaning of Junpeng’s answer :sweat_smile:
That works, thanks Nicholas :ok_hand: