And about the comparison between prediction and samples: You are comparing the wrong thing. The interval is the uncertainty in the predictor, it ignores the likelihood. You probably want to have a look at the predictive posterior instead. (What values would we expect for new measurements). You can get a sample from that using pm.sample_ppc(trace). Something like this:
with hierarchical_model:
ppc = pm3.sample_ppc(trace)
bd = ppc['y_pred']
bd_hpd = pm3.hpd(bd)
masked['pred'] = np.median(bd, axis=0)
masked['prctile_95'] = bd_hpd[:, 1]
masked['prctile_5'] = bd_hpd[:,0]