LogitNormal vs. Beta vs. Logistic

Hi -

I am attempting to predict a percentage variable (bounded by 0 and 1, with no instances of 0 or 1 occurring), using a continuous normal variable and a bounded normal variable. Here is my code:

from theano import shared

homespread = shared(data.HomeSpread.values)
homeodds = shared(data.ImpliedHomeOdds.values)

with pm.Model() as model: 

    # Define priors
    intercept = pm.Normal('Intercept', 0, sd=10)
    x0 = pm.Normal('x0', mu=0, sd=20)
    x1 = pm.Normal('x1', mu=0.5, sd=.1)
    y_est = pm.math.sigmoid(intercept+x0*homespread+x1*homeodds)
    model_err = pm.Normal('model_err',mu=0.5,sd=.1)

    # Data likelihood
    y_like = pm.LogitNormal('y_like',mu=y_est,sd=model_err,observed=y)

    trace = pm.sample(20000,tune=5000)

Using the LogitNormal does not seem to (1) appropriately handle independent values above 0 and (2) predicts a tighter posterior than occurs in reality, as tested via:


ppc = pm.sample_ppc(trace, model=model, samples=10000)
_, ax = plt.subplots(figsize=(12, 6))
ax.hist([n.mean() for n in ppc['y_like']], bins=19, alpha=0.5)
ax.set(title='Posterior predictive of the mean', xlabel='mean(x)', ylabel='Frequency');

Any suggestions for how to best go about modelling this problem in PyMC3 would be much appreciated. Of note, I have tried using a Beta distribution in place of the LogitNormal as well, but struggled identifying the appropriate alpha/beta priors and ended up with a posterior far too tight around .5.


Since there is no data and figure it is a bit difficult to say what is the problem, so just FYI you can parameterize the Beta distribution with mean and sd, as long as the sd satisfy that sd < (1-mu)*mu