Switch-like behaviour for parallel models = optimization failure?

I am quite new to this, so please excuse me if my question sounds stupid.

What I am trying to do is build a rather simple model which approximates several normal distribution in dependence on some measured integer value.

So the data could look like something like this:
[ (1, 1.2), (0, 0.9), (1, 0.8), (2, 1.3), … ] , so each measurement has the form (k, val).

Now, I could model each separately, so have one model for each k, but for learning purposes, I wanted to try to model it in one go, like this:

with pm.Model() as m:
    observed_vals = pm.Data('obs_v', vals_from_above)
    observed_k = pm.Data('obs_k', k_from_above)

    # for k ranging from zero to four, we need five different mu's:
    mu = pm.Normal('mu', mu=1.0, sigma=1.0, shape=5)

    # k can follow a Poisson-distribution:
    k = pm.Poisson('k', mu=0.5, observed=observed_k)
    mu_k = pm.Deterministic('mu_k', mu[k])

    val = pm.Normal('v', mu=mu_k, sigma=0.1, observed=observed_vals)

The assumption was that during run-time, the code should decide which mu[k]-value to choose, based on the respective observed k. So for each row, it would assume one out of five mu-values and use that, with k acting as index to the model / model switch.

When I run this, the compiler complains about an “Optimization failure”.

I suspect this is down to Theano not being able to compute a gradient in this graph?

Does anyone have an idea how to properly do this or is this a use-case for parallel models only?

P.S.: For reference, fixing mu by choosing mu_k = pm.Deterministic('mu_k', mu[0] works perfectly fine.