You’re right here that the intercept isn’t necessary. Its more of a placeholder for the terms of a linear model that I stuck in for mental clarity. McElreath’s formulation is to use the inverse logit, estimate all the K-1 cutpoints, and put in the linear predictor coefficients.
So in my example above you could expand the deterministic eta line with terms:
η = pm.Deterministic('η', pm.invlogit(cut - β1*x1 + β2*x2))
Kruschke on the other hand fixes two cut points (first and last) to constants on the scale of the responses, and estimates the others. The sigma of the latent distribution is also estimated and the mean is the linear combination. I’m trying to get thoughts clear on all of this and will try to do a blog post or similar soon.