Frequency of Missing Value Imputation?

schwarls37 · February 15, 2022, 6:58pm

EDIT: Sorry – false alarm – I had incorrectly understood my posterior dependency structure. There is no question here and all is well! This is a “selection prior” model not a “dropout” specification…

I’m playing around with PyMC in a slightly strange way, but I’m surprised by the behavior I’m seeing; namely, it does not appear that missing values are imputed upon every posterior iteration. The final plot shows the binary status of 10 missing Bernoulli variables as they are imputed over 400 posterior iterations, and they are much more sticky than is possible when flipping a coin.

How are missing values imputed in PyMC? What’s the frequency/cadence of the imputation?

n = 100
x = np.random.normal(loc=10, size=n)
width,depth=10,1
d = np.zeros((width,depth)) # if we mask this it will be automatically imputed
d = np.ma.masked_array(d, mask = d==0) # via sampling it from its distribution
with pm.Model() as dropout_network:
    dropout_rate = 0.5
    dropout_layers = pm.Bernoulli("dropout", p=dropout_rate, shape=d.shape, observed=d)
    mu = pm.Normal("prior", shape=d.shape)
    pm.Normal("likelihood", mu=mu.T.dot(dropout_layers), sigma=1, observed=x)
    step = pm.Slice([mu])
with dropout_network:
    trace = pm.sample(200, step=step, tune=200, chains=2)

schwarls37 · February 15, 2022, 8:19pm

Sorry – false alarm – these random variables have conditional dependence on the means in the posterior – so I was incorrectly expecting i.i.d. behavior!

schwarls37 · February 16, 2022, 2:32pm

Of relevance here is this previous post where @junpenglao notes that this dropout is a model fitting technique.

If I were to perform Metropolis sampling on my mu variables based on sample/draws/proposals of d from the Bernoulli prior, I would be performing monte carlo integration as part of my posterior sampling scheme. As a strategy, with a (quite wide) uniform proposal distribution for mu, this could overcome the posterior symmetry identifiability that is the more standard MCMC techniques will not address (as they will generally be get stuck in one of the symmetric posterior modes), but it will not address the fact that this model specification is a “selection prior” not a “dropout” specification.

Topic		Replies	Views
Could Someone Give me Advice for Handling Missing Data in Bayesian Modeling with PyMC? v5 theano , modeling	1	109	January 31, 2025
Automatic imputation for posterior predictive check? Questions	1	529	August 17, 2020
Disabling missing data imputation Questions	17	2197	October 10, 2023
Automatic imputation - array dimension problem Questions	2	667	February 10, 2022
Missing data imputation for predictors without triggering metropolis? v5 modeling	2	25	April 4, 2025

Frequency of Missing Value Imputation?

Related topics