Prior distribution for "certainty" parameter of Beta distribution

researcher · April 10, 2018, 12:22am

I was wondering if anyone here knew of any recommendations for the “certainty” or “kappa” parameter of a Beta distribution?

From these two sources:

the default non-informative prior is a Pareto(alpha=1, m=1.5) distribution, but using this distribution results in tails that are too long for my purposes.

Is there any other distribution that would be recommended as the prior for this parameter that has shorter tails? Basically, the resulting inference has certainty parameters on the order of thousands when the data themselves have sample sizes on the order of tens and I want to incorporate this information into a prior.

junpenglao · April 10, 2018, 4:42am

I think the Exponential prior as used in the pymc3 doc is pretty good, and it has a much lighter tail compare to Pareto.

researcher · April 10, 2018, 3:19pm

Hmm, that’s what I am doing. Unfortunately the kappa parameter determined by the inference is something like 2500 when the sample sizes in the data are typically 1000 or less with a median of 575. The sample-sizes themselves follow a heavy-tailed distribution but I don’t think 5000 is a good fit for the data.

Is there any way I can incorporate that information into a prior distribution or am I thinking about this the wrong way?

Here is a histogram of the sample-sizes for reference:

junpenglao · April 10, 2018, 3:41pm

I see. Hmmm I would try a mixture as the sample size > 1000 looks like outliners.

researcher · April 10, 2018, 3:54pm

Is there any literature to point to that could guide me?

I think what you are suggesting is that there is some probablity pi of belonging in the outlier event and if you are in the outlier event use one distribution vs if you aren’t in the outlier event, use another distribution.

junpenglao · April 10, 2018, 4:12pm

That’s exactly what I meant.

If you have already a working code of your model, you can essentially copy and paste to create a new component. Take the pymc partial pooling model as an example:

with pm.Model() as baseball_model:

    phi = pm.Uniform('phi', lower=0.0, upper=1.0)

    kappa_log = pm.Exponential('kappa_log', lam=1.5)
    kappa = pm.Deterministic('kappa', tt.exp(kappa_log))

    thetas = pm.Beta('thetas', alpha=phi*kappa, beta=(1.0-phi)*kappa, shape=N)
    y = pm.Binomial('y', n=at_bats, p=thetas, observed=hits)

to extend it into a mixture model:

with pm.Model() as mixture_model:

    phi = pm.Uniform('phi', lower=0.0, upper=1.0)

    kappa_log = pm.Exponential('kappa_log', lam=1.5)
    kappa = pm.Deterministic('kappa', tt.exp(kappa_log))

    thetas = pm.Beta('thetas', alpha=phi*kappa, beta=(1.0-phi)*kappa, shape=N)

    phi2 = pm.Uniform('phi2', lower=0.0, upper=1.0)

    kappa_log2 = pm.Exponential('kappa_log2', lam=1.5)
    kappa2 = pm.Deterministic('kappa2', tt.exp(kappa_log2))

    thetas2 = pm.Beta('thetas2', alpha=phi2*kappa2, beta=(1.0-phi2)*kappa2, shape=N)
    w = pm.Dirichlet(...) # <- careful here 
    y = pm.Mixture('y', w=w, comp_dists=[pm.Binomial.dist(n=at_bats, p=thetas),
                                         pm.Binomial.dist(n=at_bats, p=thetas2)],
                   observed=hits)

Of course, to do proper inference you need to carefully think about the priors for each parameter. Do a search on this discourse - there are quite a few great discussions here already.

Topic		Replies	Views
Struggling with Beta mixture models Questions	3	1839	February 2, 2019
Mixture Model Dirichlet Questions	7	3026	June 1, 2018
How to chose beta for Half Cauchy distribution? Questions	4	2905	November 8, 2019
How to model Beta distribution with a given prior Questions	4	641	May 16, 2020
How to decide on what priors distributions to use for parameters? Questions	22	5807	January 7, 2022

Prior distribution for "certainty" parameter of Beta distribution

Related topics