I’ve got some binary questions that asked some respondents in an area. My goal is to estimate the average proportion of saying yes to binary questions for all areas.
My data frame has the shape of (N, 5) with N being the number of areas, 4 columns responding to the total number of people saying yes in each area and the 5th column (named sizes) for the total number of people asked per area.
some data processing:
var_list = (y1,y2,y3,y4)
vals = data.loc[:,var_list]
id_array = data.index.to_list()
I’ve had success modelling one binary outcome with hierarchical partial pooling as follows:
with pm.Model() as model:
u = pm.Uniform('global_p',lower=0.0, upper=1.0)
v = pm.Gamma('v', alpha=1, beta=20)
alpha = u*v
beta = v*(1-u)
p_observed = pm.Beta("p_observed", alpha=alpha, beta=beta)
observed_data = pm.Binomial("observed_data",
n = data.sizes.values,
p = p_observed,
observed = vals.values)
However, now that I want to model 4 binary variables at a time (y1, y2, y3,y4) as they are mutually exclusive (sum(p1,p2,p3,p4) = 1). I’ve tried the Dirichlet-Multinomial as follows but I can’t find the priors for the Dirichlet distribution to perform partial pooling as in the case of one binary variable:
with pm.Model() as model:
p_observed = pm.Dirichlet('p_observed', a=np.ones(4),
shape=(data.shape[0],4))
observed_data = pm.Multinomial('observed_data', p=p_observed[id_array],
n=data.sizes.values,
observed=vals.values) `
Any comments would be much appreciated!
Many thanks,