I’ve built a logistic regression model on discrete data (2 binary predictors, 2 ordinal predictors) and expected that my model summary would only output a single row for the predicted variable, instead it outputs a row for each observation in my dataset.
Is there an error in my code? Or is it what the model predicts given each observation?
with pm.Model() as model:
# priors on parameters
beta_0 = pm.Normal("beta_0", mu=0, sigma=1)
a = pm.Normal("A", mu=0, sigma=1)
b = pm.Normal("B", mu=0, sigma=1)
c = pm.Normal("C", mu=0, sigma=1)
d = pm.Normal("D", mu=0, sigma=1)
# probability of belonging to class 1
p = pm.Deterministic("p(Class=1)", pm.math.sigmoid(beta_0+
a*cleaned["A"]+
b*cleaned["B"]+
c*cleaned["C"]+
d*cleaned["D"])
)
with model:
#fit the data
observed = pm.Bernoulli("Class=1", p, observed=cleaned["Predict"])
start = pm.find_MAP()
step = pm.Metropolis()
#samples from posterior distribution
trace = pm.sample(25000, step=step, initvals=start)
Also, the model trace shows alot of variability in the predicted variable distribution. Does this indicate that my model is a poor fit, or rather that there’s alot of uncertainty in the predictions?
Any feedback is greatly appreciated