What does the hierarchical model look like when having missing in observed?

Basically you get p(non-missing | \theta) and p(missing | \theta_posterior) = p(missing | \theta, non-missing). The later case is added to the model as a free RV (so you actually see it in the trace, think of it as more similar to a posterior sample).