Update:
I think I found the issue with the above model. Appears I was a bit confused about the role of the ‘shape’ parameter when specifying the observed/RV’s. This blog post from @lucianopaz really helped clarify things!
This is my updated model, which seems to sample better (~1 hr). Moreover, I can now fit with ‘advi’. There are still 1-2 divergent transitions and I haven’t given it a full run-through with posterior predictive checks, so there might still be a better way to specify things.
with pm.Model() as model2:
mu_alpha = pm.StudentT("mu_alpha",nu=1, mu=0, sigma=100)
sigma_alpha = pm.HalfNormal("sigma_alpha", sd=100)
source = pm.intX(pm.Data("source", sources_encoded[observed_idx.flatten()]))
a = pm.Normal("a", mu=mu_alpha, sd=sigma_alpha, shape=(n_sources,))
p = pm.Deterministic("p", pm.math.invlogit(a[source]))
becameLicensedObs = pm.Binomial(
"outcomeObs", 1, p, observed=df["outcome"].values[observed_idx.flatten()],
)
# latent probability observation of category is missing | category
pMiss = pm.Dirichlet("pMiss", a=np.ones(n_sources))
# components of the mixture generataing the observations with
# unknown categories. This makes the assumption that the missing
# observations come from one of the categories known to be part of source
# and that missingness might depend on the what that category is.
comps = pm.Binomial.dist(n=1, p=pm.math.invlogit(a), shape=(n_sources,))
becameLicensedMissing = pm.Mixture(
"outcomeMissing",
comp_dists=comps,
w=pMiss,
observed=df["outcome"].values[missing_idx.flatten()],
)
As a side note: generating the plate notation diagram with graphiz was really helpful in working through this problem so I thought I would include it here:
