I see a few potential problems.
- It looks to me like you’re sharing a prior on
{key}_logit:muand{key}_logit:sdacross all stages. Is this intentional? You may want to create these priors per stage instead. - You’re treating this essentially as hierarchical logistic regression, right? You may need to enforce something like sum-to-zero constraints on the coefficients. This notebook has some examples that might help you get started.
- (edit) Also, on further thought, I’m not sure using \frac{e}{1 + e} is the right value for the prior - should probably just be 0, and may want to lose the hyperprior
{key}_muentirely. - (edit) There’s also no intercept here - that might be a helpful thing to include.
with pm.Model() as model:
deterministics = {}
dists = {}
thetas = []
for key in data.keys():
if key != target:
# Edit: may want to replace this with mu = 0
mu = pm.StudentT(f'{key}_mu', nu=1, mu=0, sd=1)
sig = pm.HalfStudentT(f'{key}_sig', sd=1)
dists[key] = pm.Normal(f'{key}_logit', mu=mu, sd=sig, shape=df[key].nunique())
deterministics[key] = pm.Deterministic(key, pm.invlogit(dists[key]))
thetas.append(dists[key][df[key].values])
intercept = pm.Normal('intercept', mu=0, sd=1)
theta = intercept + T.sum(thetas, axis=0)
# likelihood of observed data
fault = pm.Bernoulli(target, p=pm.invlogit(theta), observed=df[target].values)