Sensible prior for logistic regression with one or no input variable

Hi all,

I have what feels like a very basic question. I am trying to get the HPD interval for the mean of a binary outcome (i.e. the interval for a probability value). I am currently running a logistic regression with just one input variable (which is used for stratifying the data). So essentially I just want to model probability values for different subgroups within the data. Currently I am using bambi and it uses a Normal prior. I was wondering if this is how one would normally go about this or if there’s another way (maybe another prior) to do this better? Also, if I just have the outcome values and no input variable (so just want to estimate the overall HPD interval for the probability for the whole dataset), how would I do that ideally? I used a Beta prior with alpha=beta=1 to do this for now but don’t know if this is sensible.

Thank you!

Hello :slight_smile:

if I understood this correctly, maybe this is what you are looking for

# simulate data
import numpy as np

import pymc3 as pm

groups = {
    0: 0.3,
    1: 0.1,
    2: 0.7
n = 100

groups_idx = []
observed = []

for group, p in groups.items():
    groups_idx.extend([group] * n)
    observed.extend([np.random.binomial(1, p) for sample in range(n)])
groups_idx = np.array(groups_idx)
observed = np.array(observed)

# build the model and sample from it
with pm.Model() as bernoulli_model:
    groups_data = pm.Data(
        'Groups Indices',
    observed_data = pm.Data(
        'Observed Data',
    # weakly informative prior centred on 0.5
    p = pm.Beta(
    observed = pm.Bernoulli(
with bernoulli_model:
    trace_bernoulli = pm.sample()

If what I illustrated reflects your case, you could also model this as a binomial model summing the outcomes of the bernoulli trials over groups.


Great, thanks a lot! That’s exactly what I was looking for :slight_smile:

1 Like