Frequency of Missing Value Imputation?

EDIT: Sorry – false alarm – I had incorrectly understood my posterior dependency structure. There is no question here and all is well! This is a “selection prior” model not a “dropout” specification…

I’m playing around with PyMC in a slightly strange way, but I’m surprised by the behavior I’m seeing; namely, it does not appear that missing values are imputed upon every posterior iteration. The final plot shows the binary status of 10 missing Bernoulli variables as they are imputed over 400 posterior iterations, and they are much more sticky than is possible when flipping a coin.

How are missing values imputed in PyMC? What’s the frequency/cadence of the imputation?

n = 100
x = np.random.normal(loc=10, size=n)
width,depth=10,1
d = np.zeros((width,depth)) # if we mask this it will be automatically imputed
d = np.ma.masked_array(d, mask = d==0) # via sampling it from its distribution
with pm.Model() as dropout_network:
    dropout_rate = 0.5
    dropout_layers = pm.Bernoulli("dropout", p=dropout_rate, shape=d.shape, observed=d)
    mu = pm.Normal("prior", shape=d.shape)
    pm.Normal("likelihood", mu=mu.T.dot(dropout_layers), sigma=1, observed=x)
    step = pm.Slice([mu])
with dropout_network:
    trace = pm.sample(200, step=step, tune=200, chains=2)

Sorry – false alarm – these random variables have conditional dependence on the means in the posterior – so I was incorrectly expecting i.i.d. behavior!

3 Likes

Of relevance here is this previous post where @junpenglao notes that this dropout is a model fitting technique.

If I were to perform Metropolis sampling on my mu variables based on sample/draws/proposals of d from the Bernoulli prior, I would be performing monte carlo integration as part of my posterior sampling scheme. As a strategy, with a (quite wide) uniform proposal distribution for mu, this could overcome the posterior symmetry identifiability that is the more standard MCMC techniques will not address (as they will generally be get stuck in one of the symmetric posterior modes), but it will not address the fact that this model specification is a “selection prior” not a “dropout” specification.