Hello dear PyMCommunity!
Context
As an intellectual challenge, I’m trying to model elections in Paris at the districtlevel (there are 20 “arrondissements”) – this is an excuse to use predictors at the districtlevel and the citylevel, just as in the radon NB, where the uranium predictor is used at the countylevel.
I put the data and model in this gist, but here is the model:
with pm.Model() as m_h_intercept_polls:
mus_parties = []
for p_id, p in enumerate(PARTIES[:1]):
city_a = pm.Normal(f"city_a_{p}", mu=1.8, sigma=1.0)
city_b = pm.Normal(f"city_b_{p}", mu=0.0, sigma=0.1, shape=2)
σ_a = pm.HalfNormal(f"σ_a_{p}", 1.0)
a = city_a + city_b[0] * city_unemp + city_b[1] * polls[:, p_id]
za_district = pm.Normal(
f"za_district_{p}", 0.0, 1.0, shape=(Ndistricts, Ndates)
)
a_district = pm.Deterministic(f"a_district_{p}", a + za_district * σ_a)
b = pm.Normal(f"b_{p}", 0.0, 0.1, shape=2)
# expected value per district:
mu = a_district[district_id, date_id] + pm.math.dot(district_preds, b)
mus_parties.append(mu)
mus_parties = tt.as_tensor_variable(mus_parties).T
# append last category:
vary_pivot = tt.as_tensor_variable(
np.full(shape=(Ndistricts, Ndates, 1), fill_value=2.2)
)
mus_parties = tt.horizontal_stack(mus_parties, vary_pivot[district_id, date_id])
# preferences of each district, for all elections:
latent_p = pm.Deterministic("latent_p", tt.nnet.softmax(mus_parties))
# zeroinflation process:
# keep only preferences for available parties:
slot_prob = parties_available * latent_p
# normalize preferences:
slot_prob = pm.Deterministic(
"slot_prob", slot_prob / tt.sum(slot_prob, axis=1, keepdims=True)
)
R = pm.Multinomial("R", n=N, p=slot_prob, observed=R_obs)
The process can be complicated so I had to do some modeling choices that are too long to detail here but I welcome any question.
Problem
This model samples without divergences, but sampling is very long and NUTS diagnostics aren’t good:
The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.
The absence of divergence is puzzling to me, as there is clearly a problem with the citylevel parameters – the rest is in the gist, but here is a sample:
Questions

I tried with different priors, used a noncentered parametrization, increased
target_accept
and the number of tuning steps – nothing seemed to work. Does it just mean this is a case where these parameters can’t be estimated with these data? Or is there something wrong with my implementation? 
The uranium example only has one obs per household. Here, I have 12 obs (i.e elections) per district, so I had to include this dimension in the model – did I do it wrong?
I’m pretty sure I’m doing something wrong but I can’t see where, as I’ve never seen this case when the sampler doesn’t converge but there are no divergences
I hope it’s clear and I’m really grateful for any help!
PyMCheers from Paris