Ok so I think I managed to write an intercept-only model that accounts for the missing categories, thanks to your advice n°1 @junpenglao!
However, before claiming victory, I’d like to add a slope and test the model against simulated data. To do that, I’m using an MvNormal, so that the model accounts for potential covariation between intercepts and slopes:
with pm.Model() as m_slope:
# cov matrix:
sd_dist = pm.Exponential.dist(1)
packed_chol = pm.LKJCholeskyCov('chol_cov', eta=4, n=2, sd_dist=sd_dist, shape=Ncategories - 1)
chol = pm.expand_packed_triangular(2, packed_chol, lower=True, shape=Ncategories - 1)
# Extract rho and sds:
cov = pm.math.dot(chol, chol.T)
sigma_ab = pm.Deterministic('sigma_district', tt.sqrt(tt.diag(cov)))
corr = tt.diag(sigma_ab**-1).dot(cov.dot(tt.diag(sigma_ab**-1)))
r = pm.Deterministic('Rho', corr[np.triu_indices(2, k=1)])
# means:
ab = pm.Normal('ab', mu=[-1.8, -0.5] , sd=[0.1, 0.2], shape=(Ncategories - 1, 2))
# varying effects:
ab_cluster = pm.MvNormal('ab_cluster', mu=ab, chol=chol, shape=(Nclusters, Ncategories - 1, 2))
mu = ab_cluster[cluster_id, :, 0] + ab_cluster[cluster_id, :, 1] * predictor
My goal is to get the mu matrix with all clusters and categories, that I can then pass to softmax (not shown here). But how do I get there, as pm.LKJCholeskyCov doesn’t accept a shape argument? 
Is there an elegant solution, or do I have to do a loop to construct the mu matrix progressively?
Thank you very much in advance for your help 