Priors for Dirichlet distribution

I’ve got some binary questions that asked some respondents in an area. My goal is to estimate the average proportion of saying yes to binary questions for all areas.
My data frame has the shape of (N, 5) with N being the number of areas, 4 columns responding to the total number of people saying yes in each area and the 5th column (named sizes) for the total number of people asked per area.

some data processing:

var_list = (y1,y2,y3,y4)
vals = data.loc[:,var_list]
id_array = data.index.to_list()

I’ve had success modelling one binary outcome with hierarchical partial pooling as follows:

 with pm.Model() as model:
        u = pm.Uniform('global_p',lower=0.0, upper=1.0)
        v = pm.Gamma('v', alpha=1, beta=20)

        alpha = u*v
        beta = v*(1-u)
        p_observed = pm.Beta("p_observed", alpha=alpha, beta=beta)
        
        
        observed_data = pm.Binomial("observed_data",
                             n = data.sizes.values,
                             p = p_observed, 
                             observed = vals.values)

However, now that I want to model 4 binary variables at a time (y1, y2, y3,y4) as they are mutually exclusive (sum(p1,p2,p3,p4) = 1). I’ve tried the Dirichlet-Multinomial as follows but I can’t find the priors for the Dirichlet distribution to perform partial pooling as in the case of one binary variable:

        with pm.Model() as model:
            p_observed = pm.Dirichlet('p_observed', a=np.ones(4), 
                                     shape=(data.shape[0],4))
        
            observed_data = pm.Multinomial('observed_data', p=p_observed[id_array], 
                                    n=data.sizes.values,
                                    observed=vals.values) `     

Any comments would be much appreciated!
Many thanks,

The direct generalization of your previous prior would be something like

u = pm.Dirichlet('dprior_p', np.ones(4, dtype=np.float32), shape=(4,))
v = pm.Gamma('dprior_v', alpha=1, beta=20, shape=(4,))
p_observed = pm.Dirichlet('proportions', a=u*v, shape=(4,))

It may be more effective to constrain the prior a little bit and let the shape of v be (1,).

There are certainly other things to do as well, but this should give you something to go on. Let me know how these work.

2 Likes

It does work indeed! Thank you!