Interpreting Output of Binomial Model

I’m working on a binomial model that in it’s simplest form has two parameters, one based on treatment received and the other based on gender.

I’m most interested in the p parameter and wanted to get feedback on whether I’m interpreting things correctly. The ultimate goal is to be able to work with a sample set that isn’t necessarily balanced between genders in order to make inferences about the value of p for the two treatments for the entire population.

Here’s an example:

import arviz as az
import numpy as np
import pymc3 as pm

n = 1_000

population_gender_share = [.5, .5]
sample_gender_share = [.75, .25]

true_male_p = 0.025
true_female_p = 0.075

true_treatment_effect = 0.5

treatment = np.random.choice([0, 1], n)

female = np.random.choice([0, 1], n, p=sample_gender_share)

p_clicked = ((true_female_p * female) + (true_male_p * (female - 1) * -1)) * (
    1 + treatment * true_treatment_effect

outcome = np.random.binomial(1, p=p_clicked)

with pm.Model() as m:
    b_treatment = pm.Normal("b_treatment", 0, 1.5, shape=2)
    treatment_id = pm.intX(pm.Data("treatment_id", treatment))

    b_gender = pm.Normal("b_gender", 0, 0.5, shape=2)
    gender_id = pm.intX(pm.Data("gender_id", female))

    p = pm.Deterministic(
        "p", pm.math.invlogit(b_gender[gender_id] + b_treatment[treatment_id])

    clicked = pm.Binomial("clicked", 1, p, observed=outcome)

    trace = pm.sample(random_seed=123, tune=2000)
    prior = pm.sample_prior_predictive(samples=1000)
    idata_model = az.from_pymc3(

    pm.set_data({"treatment_id": [0, 0, 1, 1], "gender_id": [0, 1, 0, 1]})
    post = pm.sample_posterior_predictive(
        idata_model.posterior, random_seed=123, var_names=["p"]

            "control": np.average(
                post["p"][:, :2],
            "treatment": np.average(
                post["p"][:, 2:],

Is the way I’m working with p in the posterior sample valid for inference?

I think so. I might suggest inspecting the actual distribution of predicted values rather than just plotting the means/HDI and making assumptions about what the underlying distribution actually looks like. That would give you a better sense of whether you are recovering the parameters used to generated the simulated data.

This scenario looks a bit like a non-hierarchical version of multilevel regression with poststratification (MRP). You are interested in population-level treatment effects and believe sex to moderate those treatment effects? And you suspect that you will have non-representative samples wrt to sex? Is that all accurate? If so, I might suggest looking into MRP (e.g., check out Austin’s blog post; it’s rather old and thus uses an older version of pymc, but should give you an idea about the approach). Your setup is much simpler and will thus required fewer levels in the hierarchy, but I would definitely recommend the hierarchical approach. Depending on your sample size and the sex imbalance you are expecting in your sample, a hierarchical model can extract more information (e.g., using data from males to help inform female parameters and vice versa) and may allow you to better generalize.


Good point. I was just show the forest plot as one example of how to work with the distributions in order to confirm I wasn’t misunderstanding something but I do intend to explore them further.

I should have mentioned this in the original post but this is exactly what I was going for. I actually used the post you mention to guide things a bit, however it was a bit difficult given the data no longer seems to be available and as you mentioned it uses an older version of pymc. I’d love to publish an updated version if things work out.

Eventually I intend to use a hierarchically model as well as introduce additional predictors but I wanted to make sure I was at least on the right track with a very simple version of the model. Thanks for the feedback!