Suppose that we are sampling from three binomial distributions with n=2 but different p: x_i \sim \text{Binomial}(2, p_i), and then we sum those binomial realizations to create a new random variable z = x_1 + x_2 + x_3
Given some data for z, I want to build a Bayesian model to infer the distribution of p_i of each of the Binomials, I used the following code to generate some fake data:
# True probabilities of each of the Binomials
true_probs = [0.5, 0.85, 0.35]
N = 50 # dataset size
draws = []
np.random.seed(42)
for i in range(len(true_probs)):
draw = np.random.binomial(n=2, p=true_probs[i], size=N)
draws.append(draw)
draws = np.hstack(
[np.array(draws[i])[:, np.newaxis] for i in range(3)]
)
z = np.sum(draws, axis=1)
I know that z will have 7 distinct values (from 0 to 6), I want my model to infer the posterior distribution of each p_i given the observed data of z, and this what I tried before getting stuck in likelihood function selection:
with pm.Model() as model:
p = pm.Uniform('p', 0, 1, shape=(3,))
x1 = pm.Binomial('x1', n=2, p=p[0])
x2 = pm.Binomial('x2', n=2, p=p[1])
x3 = pm.Binomial('x3', n=2, p=p[2])
z_obs = pm.Deterministic('z_obs', x1 + x2 + x3)
I know that I cannot plug in observed=
in pm.Deterministic
, but I am stuck in linking the observed data to the prior distribution of p.
Hopefully my question is clear, can anyone suggest how to proceed?