Hi all,
I’m trying to calculate the posterior predictive mean for an uncentered hierarchical logistic regression model I’ve fit. For context, the model has
- a single continuous regressor
x
- categorical feature
categ
with values [0… 14] - binary outcome vector
y_train
- offset to both the intercept and slope terms in my regression for each of the
categ
values
Further, I have reason to believe that the two offsets for each category are related so I’m using a covariance matrix to reflect that. I’ve followed along with a couple different examples in the pymc documentation but I could use some piecing together the posterior predictive likelihood. Here’s my model setup (not reproducible written as-is)…
categ_idx = [....]
n_categories = 15
coords = {}
coords["categ"] = list(range(n_categories))
coords["param"] = ["a", "b"]
coords["param_bis"] = ["a", "b"]
with pm.Model(coords=coords) as logistic_model:
sd_dist = pm.Exponential.dist(0.5)
chol, corr, stds = pm.LKJCholeskyCov("chol", n=2, eta=2.0, sd_dist=sd_dist, compute_corr=True)
a = pm.Normal("a", mu=0.0, sigma=25.0)
b = pm.Normal("b", mu=0.0, sigma=25.0)
z = pm.Normal("z", 0.0, 1.0, dims=("param", "categ"))
ab_categ = pm.Deterministic("ab_categ", tt.dot(chol, z).T, dims=("categ", "param"))
lin_pred = ((a + ab_categ[categ_idx.values, 0]) +
(b + ab_categ[categ_idx.values, 1]) * X_train['x'].values)
# Data likelihood
likeli = pm.Bernoulli('likeli', logit_p=lin_pred, observed=y_train.values)
trace = pm.sample(1000, tune=1000, target_accept=0.95)
burnt_trace = trace[200:]
The code below is what I have so far for the posterior predictive likelihood. Its output is reasonable but I don’t understand the indexing into burnt_trace['ab_categ']
and really only got this working through some trial an error. I’d like to know what each index a,b,c,d
into the trace array burnt_trace['ab_categ'][a, b, c][d]
corresponds to before I move to higher-dimensional problems.
lin_pred = ((burnt_trace['a'] + burnt_trace['ab_categ'][categ_index, 0, :][0]) +
(burnt_trace['b'] + burnt_trace['ab_categ'][categ_index, 0, :][1]) * x)
prob = (1 / (1 + np.exp(-lin_pred))).mean()
This example is where I was drawing the inspiration from for the Covariance