Why does my logistic regression model summary give a mean for each observation?

I’ve built a logistic regression model on discrete data (2 binary predictors, 2 ordinal predictors) and expected that my model summary would only output a single row for the predicted variable, instead it outputs a row for each observation in my dataset.

Is there an error in my code? Or is it what the model predicts given each observation?

with pm.Model() as model:
    # priors on parameters
    beta_0 = pm.Normal("beta_0", mu=0, sigma=1)
    
    a = pm.Normal("A", mu=0, sigma=1)
    b = pm.Normal("B", mu=0, sigma=1)
    c = pm.Normal("C", mu=0, sigma=1)
    d = pm.Normal("D", mu=0, sigma=1)

    # probability of belonging to class 1
    p = pm.Deterministic("p(Class=1)", pm.math.sigmoid(beta_0+
                                                     a*cleaned["A"]+
                                                     b*cleaned["B"]+
                                                     c*cleaned["C"]+
                                                     d*cleaned["D"])
                        )
with model:
    #fit the data 
    observed = pm.Bernoulli("Class=1", p, observed=cleaned["Predict"])
    start = pm.find_MAP()
    step = pm.Metropolis()
    
    #samples from posterior distribution 
    trace = pm.sample(25000, step=step, initvals=start)

Also, the model trace shows alot of variability in the predicted variable distribution. Does this indicate that my model is a poor fit, or rather that there’s alot of uncertainty in the predictions?

Any feedback is greatly appreciated :smiling_face:

You are recording p which is going to have the same shape as your cleaned data.

The uncertainty doesn’t look unreasonable given the spread of the parameters A-D.

1 Like