I assume it would be something like this:
with pm.Model() as model:
n = pm.Data("n", value=simulated_data["N"])
observed = simulated_data["observed_counts"]
mix_frac = pm.Beta("mix_frac", alpha=1, beta=1)
p_type0 = background_rate
p_type1 = signal_rate
# some observations reflect p_type0
component_t0 = pm.Binomial.dist(
n=n,
p=p_type0,
)
# some observations reflect a mix of p_type0 and p_type1
component_t1 = pm.Binomial.dist(
n=n,
p=np.mean([p_type0, p_type1])
)
observed_counts = pm.Mixture(
"observed_counts",
w=pt.stack([1 - mix_frac, mix_frac], axis=1),
comp_dists=[component_t0, component_t1],
observed=observed,
)
idata = pm.sample()
This assumes that that the ["sample_type"] == 1
mixture is 50-50 and the model works well when the data mirrors this assumption:
If you wish to infer the mixture for the ["sample_type"] == 1
observations, you can do something like this:
other_mix_frac = pm.Beta("other_mix_frac", alpha=1, beta=1)
component_t1 = pm.Binomial.dist(
n=n,
p=([other_mix_frac*p_type0 + (1-other_mix_frac)*p_type1])
)
This assumes that there is a single other_mix_frac
governing all observations and it works when when the data mirrors this assumption (here I have set simulated_data["fraction"] = 0.75
):
However, in your original formulation, each observation had its own unique other_mix_frac
(what you called simulated_data["fraction"]
).
If you want to assume that other_mix_frac
(i.e., simulated_data["fraction"]
) varies across observations, you can do this:
other_mix_frac = pm.Beta(
"other_mix_frac", alpha=1, beta=1, shape=len(simulated_data["N"])
)
component_t1 = pm.Binomial.dist(
n=n,
p=(other_mix_frac * p_type0 + (1 - other_mix_frac) * p_type1),
shape=len(simulated_data["N"]),
)
This again works well when the data mirrors this assumption: