Exclude group inference for single parameter

  1. Why is y_obs modeled as parameter although you wrote it in pm.Deterministic ?

It probably would be better called y_pred or mu_y.

  1. Is this inference model scalable to large datasets?

Sure, but it would be clunky to implement in this way if you had several informatively-missing variables. For the more general case, I might write a helper function to generate a coefficient vector like so:

def make_coef(dat, missing_idx):
    coef = tt.ones_like(dat)
    coef = tt.set_subtensor(coef[missing_idx], 0) 
    beta = pm.Normal('beta', 0., 1.)
    return beta*coef

And then stack these.