Prior distribution for "certainty" parameter of Beta distribution


#1

I was wondering if anyone here knew of any recommendations for the “certainty” or “kappa” parameter of a Beta distribution?

From these two sources:

the default non-informative prior is a Pareto(alpha=1, m=1.5) distribution, but using this distribution results in tails that are too long for my purposes.

Is there any other distribution that would be recommended as the prior for this parameter that has shorter tails? Basically, the resulting inference has certainty parameters on the order of thousands when the data themselves have sample sizes on the order of tens and I want to incorporate this information into a prior.


#2

I think the Exponential prior as used in the pymc3 doc is pretty good, and it has a much lighter tail compare to Pareto.


#3

Hmm, that’s what I am doing. Unfortunately the kappa parameter determined by the inference is something like 2500 when the sample sizes in the data are typically 1000 or less with a median of 575. The sample-sizes themselves follow a heavy-tailed distribution but I don’t think 5000 is a good fit for the data.

Is there any way I can incorporate that information into a prior distribution or am I thinking about this the wrong way?

Here is a histogram of the sample-sizes for reference:

image


#4

I see. Hmmm I would try a mixture as the sample size > 1000 looks like outliners.


#5

Is there any literature to point to that could guide me?

I think what you are suggesting is that there is some probablity pi of belonging in the outlier event and if you are in the outlier event use one distribution vs if you aren’t in the outlier event, use another distribution.


#6

That’s exactly what I meant.

If you have already a working code of your model, you can essentially copy and paste to create a new component. Take the pymc partial pooling model as an example:

with pm.Model() as baseball_model:

    phi = pm.Uniform('phi', lower=0.0, upper=1.0)

    kappa_log = pm.Exponential('kappa_log', lam=1.5)
    kappa = pm.Deterministic('kappa', tt.exp(kappa_log))

    thetas = pm.Beta('thetas', alpha=phi*kappa, beta=(1.0-phi)*kappa, shape=N)
    y = pm.Binomial('y', n=at_bats, p=thetas, observed=hits)

to extend it into a mixture model:

with pm.Model() as mixture_model:

    phi = pm.Uniform('phi', lower=0.0, upper=1.0)

    kappa_log = pm.Exponential('kappa_log', lam=1.5)
    kappa = pm.Deterministic('kappa', tt.exp(kappa_log))

    thetas = pm.Beta('thetas', alpha=phi*kappa, beta=(1.0-phi)*kappa, shape=N)

    phi2 = pm.Uniform('phi2', lower=0.0, upper=1.0)

    kappa_log2 = pm.Exponential('kappa_log2', lam=1.5)
    kappa2 = pm.Deterministic('kappa2', tt.exp(kappa_log2))

    thetas2 = pm.Beta('thetas2', alpha=phi2*kappa2, beta=(1.0-phi2)*kappa2, shape=N)
    w = pm.Dirichlet(...) # <- careful here 
    y = pm.Mixture('y', w=w, comp_dists=[pm.Binomial.dist(n=at_bats, p=thetas),
                                         pm.Binomial.dist(n=at_bats, p=thetas2)],
                   observed=hits)

Of course, to do proper inference you need to carefully think about the priors for each parameter. Do a search on this discourse - there are quite a few great discussions here already.