Hi -
I am attempting to predict a percentage variable (bounded by 0 and 1, with no instances of 0 or 1 occurring), using a continuous normal variable and a bounded normal variable. Here is my code:
from theano import shared
homespread = shared(data.HomeSpread.values)
homeodds = shared(data.ImpliedHomeOdds.values)
y=data.MidML_Home.values
with pm.Model() as model:
# Define priors
intercept = pm.Normal('Intercept', 0, sd=10)
x0 = pm.Normal('x0', mu=0, sd=20)
x1 = pm.Normal('x1', mu=0.5, sd=.1)
y_est = pm.math.sigmoid(intercept+x0*homespread+x1*homeodds)
model_err = pm.Normal('model_err',mu=0.5,sd=.1)
# Data likelihood
y_like = pm.LogitNormal('y_like',mu=y_est,sd=model_err,observed=y)
trace = pm.sample(20000,tune=5000)
Using the LogitNormal does not seem to (1) appropriately handle independent values above 0 and (2) predicts a tighter posterior than occurs in reality, as tested via:
homespread.set_value(np.array([-7.]))
homeodds.set_value(np.array([.5]))
ppc = pm.sample_ppc(trace, model=model, samples=10000)
_, ax = plt.subplots(figsize=(12, 6))
ax.hist([n.mean() for n in ppc['y_like']], bins=19, alpha=0.5)
ax.axvline(data[data['HomeSpread']==-7.].MidML_Home.mean())
ax.set(title='Posterior predictive of the mean', xlabel='mean(x)', ylabel='Frequency');
Any suggestions for how to best go about modelling this problem in PyMC3 would be much appreciated. Of note, I have tried using a Beta distribution in place of the LogitNormal as well, but struggled identifying the appropriate alpha/beta priors and ended up with a posterior far too tight around .5.
Thanks!