I am attempting to predict a percentage variable (bounded by 0 and 1, with no instances of 0 or 1 occurring), using a continuous normal variable and a bounded normal variable. Here is my code:
from theano import shared homespread = shared(data.HomeSpread.values) homeodds = shared(data.ImpliedHomeOdds.values) y=data.MidML_Home.values with pm.Model() as model: # Define priors intercept = pm.Normal('Intercept', 0, sd=10) x0 = pm.Normal('x0', mu=0, sd=20) x1 = pm.Normal('x1', mu=0.5, sd=.1) y_est = pm.math.sigmoid(intercept+x0*homespread+x1*homeodds) model_err = pm.Normal('model_err',mu=0.5,sd=.1) # Data likelihood y_like = pm.LogitNormal('y_like',mu=y_est,sd=model_err,observed=y) trace = pm.sample(20000,tune=5000)
Using the LogitNormal does not seem to (1) appropriately handle independent values above 0 and (2) predicts a tighter posterior than occurs in reality, as tested via:
homespread.set_value(np.array([-7.])) homeodds.set_value(np.array([.5])) ppc = pm.sample_ppc(trace, model=model, samples=10000) _, ax = plt.subplots(figsize=(12, 6)) ax.hist([n.mean() for n in ppc['y_like']], bins=19, alpha=0.5) ax.axvline(data[data['HomeSpread']==-7.].MidML_Home.mean()) ax.set(title='Posterior predictive of the mean', xlabel='mean(x)', ylabel='Frequency');
Any suggestions for how to best go about modelling this problem in PyMC3 would be much appreciated. Of note, I have tried using a Beta distribution in place of the LogitNormal as well, but struggled identifying the appropriate alpha/beta priors and ended up with a posterior far too tight around .5.