Priors for Dirichlet distribution

manhnguyen48 · May 6, 2019, 11:21am

I’ve got some binary questions that asked some respondents in an area. My goal is to estimate the average proportion of saying yes to binary questions for all areas.
My data frame has the shape of (N, 5) with N being the number of areas, 4 columns responding to the total number of people saying yes in each area and the 5th column (named sizes) for the total number of people asked per area.

some data processing:

var_list = (y1,y2,y3,y4)
vals = data.loc[:,var_list]
id_array = data.index.to_list()

I’ve had success modelling one binary outcome with hierarchical partial pooling as follows:

 with pm.Model() as model:
        u = pm.Uniform('global_p',lower=0.0, upper=1.0)
        v = pm.Gamma('v', alpha=1, beta=20)

        alpha = u*v
        beta = v*(1-u)
        p_observed = pm.Beta("p_observed", alpha=alpha, beta=beta)
        
        
        observed_data = pm.Binomial("observed_data",
                             n = data.sizes.values,
                             p = p_observed, 
                             observed = vals.values)

However, now that I want to model 4 binary variables at a time (y1, y2, y3,y4) as they are mutually exclusive (sum(p1,p2,p3,p4) = 1). I’ve tried the Dirichlet-Multinomial as follows but I can’t find the priors for the Dirichlet distribution to perform partial pooling as in the case of one binary variable:

        with pm.Model() as model:
            p_observed = pm.Dirichlet('p_observed', a=np.ones(4), 
                                     shape=(data.shape[0],4))
        
            observed_data = pm.Multinomial('observed_data', p=p_observed[id_array], 
                                    n=data.sizes.values,
                                    observed=vals.values) `

Any comments would be much appreciated!
Many thanks,

chartl · May 6, 2019, 6:01pm

The direct generalization of your previous prior would be something like

u = pm.Dirichlet('dprior_p', np.ones(4, dtype=np.float32), shape=(4,))
v = pm.Gamma('dprior_v', alpha=1, beta=20, shape=(4,))
p_observed = pm.Dirichlet('proportions', a=u*v, shape=(4,))

It may be more effective to constrain the prior a little bit and let the shape of v be (1,).

There are certainly other things to do as well, but this should give you something to go on. Let me know how these work.

manhnguyen48 · May 6, 2019, 8:02pm

It does work indeed! Thank you!

Topic		Replies	Views
Simple Dirichlet model with partial pooling Questions	1	698	February 22, 2019
(newbie) using Dirichlet for partial pooling for likert-like survey responses version agnostic hierarchical	5	673	June 27, 2022
Dice, Polls & Dirichlet Multinomials Sharing	9	1320	July 21, 2021
Multinomial with Random Effects on panel data (shape issues) Questions	3	1384	March 13, 2018
Posterior to Prior in Categorical distribution (or encapsulating multiple data sources to categorical analysis) Questions modeling	4	1775	February 28, 2022

Priors for Dirichlet distribution

Related topics