I have a quick question regarding how to properly specify a hierarchical Binomial model, and I feel it is rather basic, but I have not found any relevant info anywhere on this, so I was hoping someone will enlighten me here, which I would appreciate greatly.
When estimating a Binomial model using pm.Binomial()
, the option n
tells us how many trials, of success or failure, that we have data for, for a single individual. This allows one to calculate the probability p
, for example, of a coin landing heads, after we have flipped it n
times, for the single, solitary coin.
However, in a hierarchical Binomial model, we could imagine instead having a whole bag of faulty coins (from the same disreputable source), and we wish to estimate their individual p
‘s, with a hyperprior allowing each coin’s bias to affect the estimates for the other coins’ biases.
Based on Wiecki’s blogpost here, all this would seem simple, but the n
parameter is tripping me up - each individual has a different number of trials, so what should be put down?
My code so far:
import numpy as np
import copy
import pymc3 as pm
## Data generation
coin_data = [np.random.randint(0,2, size=np.random.randint(20,42)) for i in range(5)]
print(coin_data)
coin_data_index = copy.deepcopy(coin_data)
for index, mini_array in enumerate(coin_data_index):
for i in np.nditer(mini_array, op_flags=['readwrite']):
mini_array[...] = index
coin_data = np.hstack(coin_data)
coin_data_index = np.hstack(coin_data_index)
with pm.Model() as bag_of_coins_model:
## Each coin has its own intercept 'a', and 'b' is the global intercept
length = len( np.unique(coin_data_index) )
a = pm.Normal('a', 0, 3, shape = length)
b = pm.Normal('b', 0, 3)
p = pm.math.invlogit(a[coin_data_index] + b)
flipped_heads = pm.Binomial('flipped_heads', n=1, p=p, observed = coin_data)
trace = pm.sample(2000, tune=2000)
If I set n=length
, I get completely different results, so I don’t know which to use. I imagine n
should be equal to the number of times each coin is flipped/trials for each individual, but I do not know if that is what Pymc3 is indeed doing internally.
Would someone explain how and why to set n
in this example?