I have a quick question regarding how to properly specify a hierarchical Binomial model, and I feel it is rather basic, but I have not found any relevant info anywhere on this, so I was hoping someone will enlighten me here, which I would appreciate greatly.
When estimating a Binomial model using
pm.Binomial(), the option
n tells us how many trials, of success or failure, that we have data for, for a single individual. This allows one to calculate the probability
p, for example, of a coin landing heads, after we have flipped it
n times, for the single, solitary coin.
However, in a hierarchical Binomial model, we could imagine instead having a whole bag of faulty coins (from the same disreputable source), and we wish to estimate their individual
p‘s, with a hyperprior allowing each coin’s bias to affect the estimates for the other coins’ biases.
Based on Wiecki’s blogpost here, all this would seem simple, but the
n parameter is tripping me up - each individual has a different number of trials, so what should be put down?
My code so far:
import numpy as np import copy import pymc3 as pm ## Data generation coin_data = [np.random.randint(0,2, size=np.random.randint(20,42)) for i in range(5)] print(coin_data) coin_data_index = copy.deepcopy(coin_data) for index, mini_array in enumerate(coin_data_index): for i in np.nditer(mini_array, op_flags=['readwrite']): mini_array[...] = index coin_data = np.hstack(coin_data) coin_data_index = np.hstack(coin_data_index) with pm.Model() as bag_of_coins_model: ## Each coin has its own intercept 'a', and 'b' is the global intercept length = len( np.unique(coin_data_index) ) a = pm.Normal('a', 0, 3, shape = length) b = pm.Normal('b', 0, 3) p = pm.math.invlogit(a[coin_data_index] + b) flipped_heads = pm.Binomial('flipped_heads', n=1, p=p, observed = coin_data) trace = pm.sample(2000, tune=2000)
If I set
n=length, I get completely different results, so I don’t know which to use. I imagine
n should be equal to the number of times each coin is flipped/trials for each individual, but I do not know if that is what Pymc3 is indeed doing internally.
Would someone explain how and why to set
n in this example?