Dear experts,
I built a hierarchical model for my AB test of click rates, using Gelman’s parametrization (alpha + beta)^-5/2 (see e.q. 5.9 in BDA) for my beta prior. In the book this parametrization seems to be justified with this statement
"In a problem such as this with a reasonably large amount of data, it is possible to set up a ‘noninformative’ hyperprior density that is dominated by the likelihood and yields a proper posterior distribution."
where he refers to the rat tumor example.
In my case I only have two observations, one for option A and one for option B, so I am aware of this difference wrt the BDA example. Based on this I built the following model:
def logp_ab(value):
return tt.log(tt.pow(tt.sum(value), -5/2))
with pm.Model() as model:
ab = pm.HalfFlat('ab', shape=2, testval=np.asarray([1., 1.]))
pm.Potential('p(a, b)', logp_ab(ab))
alpha = pm.Deterministic('alpha', ab[0])
beta = pm.Deterministic('beta', ab[1])
theta = pm.Beta('theta', alpha=ab[0], beta=ab[1], shape=2)
p = pm.Binomial('y', p=theta, observed=[232, 194], n=[7890, 7795])
running this with
pm.sample(draws=8000, tune=5000, chains=4, return_inferencedata=True, cores=4)
yields 8777 divergences, which probably means reparametrizing my model (I tried increasing the target_accept rate and also increase the burn-ins but still gives divergences.
I looked at the distributions of my parameters w/o divergences and w/ divergences, see below, but this doesn’t tell me much to my beginner/untrained eye. Except maybe that the divergences are not clustered around some particular values, which I’d think means that a re-parametrization is required.
What options do I have for ‘weakly informative priors’? E.g. a gamma function for the variance of my beta prior? I am trying to find the ‘weakliest informative prior’ for this case.