I’d be curious to know if there’s a better way to handle this, but I have had good outcomes doing it “by hand”. That is, using the same random variables to parameterize two distributions: one with the observed data, and one with the missing data, then fusing them back together.
For your example, I might try:
# I assume w is a pandas series
obs_idx = w.index[W.notna()]
n_missing = int(w.isna().sum()) #pymc is very strict about the dtype on size!
# Set up a likelihood using only the observed values...
W_obs = pm.Bernoulli('W_obs', p=probW, observed=w[obs_idx])
#...but sample some extra values to fill in the missing values
W_missing = pm.Bernoulli('W_missing', p=probW, size=n_missing)
# Then fuse everything together
W = at.zeros(w.shape[0])
W = at.set_subtensor(W[np.where(w.notna().values)], W_obs)
W = at.set_subtensor(W[np.where(w.isna().values)], W_missing)
If you do this with MCMC you’ll find that you need to use a composite step sampler, because the missing values (in this case) come from a discrete distribution, so that’s a bummer.
You mention just taking expectations w.r.t W and using those as the missing values. I believe that would correspond to W = at.set_subtensor(W[np.where(w.isna().values)], W_obs.mean().repeat(n_missing))? So you just fill the missing values with mean of the posterior likelihood, and dispense with the secondary distribution over the missing values (`W_missing’, in my example).
Again, very eager for feedback.