Calculating Likelihood Logistic Regression With LKJ Covariance

Hi all,

I’m trying to calculate the posterior predictive mean for an uncentered hierarchical logistic regression model I’ve fit. For context, the model has

  • a single continuous regressor x
  • categorical feature categ with values [0… 14]
  • binary outcome vector y_train
  • offset to both the intercept and slope terms in my regression for each of the categ values

Further, I have reason to believe that the two offsets for each category are related so I’m using a covariance matrix to reflect that. I’ve followed along with a couple different examples in the pymc documentation but I could use some piecing together the posterior predictive likelihood. Here’s my model setup (not reproducible written as-is)…

    categ_idx = [....]
    n_categories = 15
    coords = {}
    coords["categ"] = list(range(n_categories))
    coords["param"] = ["a", "b"]
    coords["param_bis"] = ["a", "b"]
    with pm.Model(coords=coords) as logistic_model:
        sd_dist = pm.Exponential.dist(0.5)
        chol, corr, stds = pm.LKJCholeskyCov("chol", n=2, eta=2.0, sd_dist=sd_dist, compute_corr=True)

        a = pm.Normal("a", mu=0.0, sigma=25.0)
        b = pm.Normal("b", mu=0.0, sigma=25.0)
        z = pm.Normal("z", 0.0, 1.0, dims=("param", "categ"))
        ab_categ = pm.Deterministic("ab_categ", tt.dot(chol, z).T, dims=("categ", "param"))

        lin_pred = ((a + ab_categ[categ_idx.values, 0]) +
                    (b + ab_categ[categ_idx.values, 1]) * X_train['x'].values)
        # Data likelihood
        likeli = pm.Bernoulli('likeli', logit_p=lin_pred, observed=y_train.values)
        trace = pm.sample(1000, tune=1000, target_accept=0.95)
        burnt_trace = trace[200:]

The code below is what I have so far for the posterior predictive likelihood. Its output is reasonable but I don’t understand the indexing into burnt_trace['ab_categ'] and really only got this working through some trial an error. I’d like to know what each index a,b,c,d into the trace array burnt_trace['ab_categ'][a, b, c][d] corresponds to before I move to higher-dimensional problems.

lin_pred = ((burnt_trace['a'] + burnt_trace['ab_categ'][categ_index, 0, :][0]) +
                (burnt_trace['b'] + burnt_trace['ab_categ'][categ_index, 0, :][1]) * x)
prob = (1 / (1 + np.exp(-lin_pred))).mean()

This example is where I was drawing the inspiration from for the Covariance