# Sensible prior for logistic regression with one or no input variable

Hi all,

I have what feels like a very basic question. I am trying to get the HPD interval for the mean of a binary outcome (i.e. the interval for a probability value). I am currently running a logistic regression with just one input variable (which is used for stratifying the data). So essentially I just want to model probability values for different subgroups within the data. Currently I am using bambi and it uses a Normal prior. I was wondering if this is how one would normally go about this or if there’s another way (maybe another prior) to do this better? Also, if I just have the outcome values and no input variable (so just want to estimate the overall HPD interval for the probability for the whole dataset), how would I do that ideally? I used a Beta prior with alpha=beta=1 to do this for now but don’t know if this is sensible.

Thank you!

Hello if I understood this correctly, maybe this is what you are looking for

``````# simulate data
import numpy as np

import pymc3 as pm

groups = {
0: 0.3,
1: 0.1,
2: 0.7
}
n = 100

groups_idx = []
observed = []

for group, p in groups.items():

groups_idx.extend([group] * n)
observed.extend([np.random.binomial(1, p) for sample in range(n)])

groups_idx = np.array(groups_idx)
observed = np.array(observed)

# build the model and sample from it
with pm.Model() as bernoulli_model:

groups_data = pm.Data(
'Groups Indices',
groups_idx
)
observed_data = pm.Data(
'Observed Data',
observed
)

# weakly informative prior centred on 0.5
p = pm.Beta(
'p',
2,
2,
shape=(len(groups))
)

observed = pm.Bernoulli(
'Observed',
p=p[groups_data],
observed=observed_data
)

with bernoulli_model:

trace_bernoulli = pm.sample()
``````

If what I illustrated reflects your case, you could also model this as a binomial model summing the outcomes of the bernoulli trials over groups.

2 Likes

Great, thanks a lot! That’s exactly what I was looking for 1 Like