I’ve built a logistic regression model on discrete data (2 binary predictors, 2 ordinal predictors) and expected that my model summary would only output a single row for the predicted variable, instead it outputs a row for each observation in my dataset.
Is there an error in my code? Or is it what the model predicts given each observation?
with pm.Model() as model: # priors on parameters beta_0 = pm.Normal("beta_0", mu=0, sigma=1) a = pm.Normal("A", mu=0, sigma=1) b = pm.Normal("B", mu=0, sigma=1) c = pm.Normal("C", mu=0, sigma=1) d = pm.Normal("D", mu=0, sigma=1) # probability of belonging to class 1 p = pm.Deterministic("p(Class=1)", pm.math.sigmoid(beta_0+ a*cleaned["A"]+ b*cleaned["B"]+ c*cleaned["C"]+ d*cleaned["D"]) ) with model: #fit the data observed = pm.Bernoulli("Class=1", p, observed=cleaned["Predict"]) start = pm.find_MAP() step = pm.Metropolis() #samples from posterior distribution trace = pm.sample(25000, step=step, initvals=start)
Also, the model trace shows alot of variability in the predicted variable distribution. Does this indicate that my model is a poor fit, or rather that there’s alot of uncertainty in the predictions?
Any feedback is greatly appreciated