Fitting mixture of binomials

I assume it would be something like this:

with pm.Model() as model:
    n = pm.Data("n", value=simulated_data["N"])
    observed = simulated_data["observed_counts"]
    mix_frac = pm.Beta("mix_frac", alpha=1, beta=1)
    p_type0 = background_rate
    p_type1 = signal_rate

    # some observations reflect p_type0
    component_t0 = pm.Binomial.dist(
        n=n,
        p=p_type0,
    )
    # some observations reflect a mix of p_type0 and p_type1
    component_t1 = pm.Binomial.dist(
        n=n,
        p=np.mean([p_type0, p_type1])
    )

    observed_counts = pm.Mixture(
        "observed_counts",
        w=pt.stack([1 - mix_frac, mix_frac], axis=1),
        comp_dists=[component_t0, component_t1],
        observed=observed,
    )
    
    idata = pm.sample()

This assumes that that the ["sample_type"] == 1 mixture is 50-50 and the model works well when the data mirrors this assumption:

If you wish to infer the mixture for the ["sample_type"] == 1 observations, you can do something like this:

    other_mix_frac = pm.Beta("other_mix_frac", alpha=1, beta=1)
    component_t1 = pm.Binomial.dist(
        n=n,
        p=([other_mix_frac*p_type0 + (1-other_mix_frac)*p_type1])
    )

This assumes that there is a single other_mix_frac governing all observations and it works when when the data mirrors this assumption (here I have set simulated_data["fraction"] = 0.75):

However, in your original formulation, each observation had its own unique other_mix_frac (what you called simulated_data["fraction"]).

If you want to assume that other_mix_frac (i.e., simulated_data["fraction"]) varies across observations, you can do this:

    other_mix_frac = pm.Beta(
        "other_mix_frac", alpha=1, beta=1, shape=len(simulated_data["N"])
    )
    component_t1 = pm.Binomial.dist(
        n=n,
        p=(other_mix_frac * p_type0 + (1 - other_mix_frac) * p_type1),
        shape=len(simulated_data["N"]),
    )

This again works well when the data mirrors this assumption:

1 Like