Calculating Likelihood Logistic Regression With LKJ Covariance

n2pymc · February 24, 2021, 1:25am

Hi all,

I’m trying to calculate the posterior predictive mean for an uncentered hierarchical logistic regression model I’ve fit. For context, the model has

a single continuous regressor x
categorical feature categ with values [0… 14]
binary outcome vector y_train
offset to both the intercept and slope terms in my regression for each of the categ values

Further, I have reason to believe that the two offsets for each category are related so I’m using a covariance matrix to reflect that. I’ve followed along with a couple different examples in the pymc documentation but I could use some piecing together the posterior predictive likelihood. Here’s my model setup (not reproducible written as-is)…

    categ_idx = [....]
    n_categories = 15
    coords = {}
    coords["categ"] = list(range(n_categories))
    coords["param"] = ["a", "b"]
    coords["param_bis"] = ["a", "b"]
    with pm.Model(coords=coords) as logistic_model:
        sd_dist = pm.Exponential.dist(0.5)
        chol, corr, stds = pm.LKJCholeskyCov("chol", n=2, eta=2.0, sd_dist=sd_dist, compute_corr=True)

        a = pm.Normal("a", mu=0.0, sigma=25.0)
        b = pm.Normal("b", mu=0.0, sigma=25.0)
        z = pm.Normal("z", 0.0, 1.0, dims=("param", "categ"))
        ab_categ = pm.Deterministic("ab_categ", tt.dot(chol, z).T, dims=("categ", "param"))

        lin_pred = ((a + ab_categ[categ_idx.values, 0]) +
                    (b + ab_categ[categ_idx.values, 1]) * X_train['x'].values)
        # Data likelihood
        likeli = pm.Bernoulli('likeli', logit_p=lin_pred, observed=y_train.values)
        trace = pm.sample(1000, tune=1000, target_accept=0.95)
        burnt_trace = trace[200:]

The code below is what I have so far for the posterior predictive likelihood. Its output is reasonable but I don’t understand the indexing into burnt_trace['ab_categ'] and really only got this working through some trial an error. I’d like to know what each index a,b,c,d into the trace array burnt_trace['ab_categ'][a, b, c][d] corresponds to before I move to higher-dimensional problems.

lin_pred = ((burnt_trace['a'] + burnt_trace['ab_categ'][categ_index, 0, :][0]) +
                (burnt_trace['b'] + burnt_trace['ab_categ'][categ_index, 0, :][1]) * x)
prob = (1 / (1 + np.exp(-lin_pred))).mean()

This example is where I was drawing the inspiration from for the Covariance

Topic		Replies	Views
Confused about building a compound distribution hierarchical Bayes model with continuous and categorical inputs in pymc3 Questions	1	778	June 8, 2019
Bad initial energy: inf. The model might be misspecified Questions	5	3564	June 29, 2018
Marginalizing over missing categories Questions	1	712	June 17, 2020
First attempt at a hierarchical model with custom likelihood Questions	1	700	September 18, 2018
Shared standard deviation in pymc.LKJCholeskyCov v5 modeling	1	243	January 2, 2024

Calculating Likelihood Logistic Regression With LKJ Covariance

Related topics