How Can I Impute Missing Discrete Data Efficiently?

I have a hierarchical logistic regression with some missing data. Currently, I’m trying to just use the normal :
> with maskingModel:

trace = pm.sample(60000, chains=6, cores=6, tune=30000,

This returns the following:
> >NUTS: [Local Bias in Estimates, Campaign effects, Time effects, Constants, Local Bias Caused by Campaign, Var(Bias from Campaign), Bias from Campaign, Var(Est), Bias of Est, Effect Variance, Average Campaign Effect, Var(Time Effect), Average Time Effect, Var, Mean]

>Metropolis: [Counts_missing]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 14 seconds.
There were 50 divergences after tuning. Increase `target_accept` or reparameterize.
There were 65 divergences after tuning. Increase `target_accept` or reparameterize.
There were 24 divergences after tuning. Increase `target_accept` or reparameterize.
There were 96 divergences after tuning. Increase `target_accept` or reparameterize.
The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling.

So it looks like the data imputation didn’t go well. How can I change the number of samples and get rid of the divergences? Are there samplers other than Metropolis that can be used here? I think particle Gibbs should be doable here, but I don’t know whether/how PyMC3 implements this.

Are you sure the issue is the missing data imputation? Just as a test, you can perform an independent pre-imputation (say by column median) just to see if the model samples without missing data. It would also be helpful to see the model – or at least a description.

1 Like

Not sure about discrete observations, but FWIW I’ve a worked example up here that works just fine, maybe there’s an aspect of that model that can help influence your thoughts MRE_missing_values · GitHub

I’d suggest using a hierarchical prior on the missing data value to help constrain the discrete sampling to a reasonable value

Yep – when I drop missing values the code works perfectly. The problem just seems to be that Metropolis-Hastings has difficulty with this.

with pm.Model() as maskingModel:

# Hyperpriors
# Few people use masks in Cameroon, from correspondents
mu = pm.Normal("Mean", mu=-4.5, sigma=.5)
var = pm.InverseGamma("Var", alpha=4, beta=3 * .3**2)
sigma = pm.Deterministic("sd", var**.5)

# Informative priors/regularizations on variance --
# the intervention should be similarly effective in all villages.
timeMu = pm.Normal("Average Time Effect", mu=.6, sigma=.3)
timeVar = pm.InverseGamma("Var(Time Effect)", alpha=4, beta=3*.3**2)
timeSigma = pm.Deterministic("sd(Time Effect)", timeVar**.5)

# Regularizing prior:
effectMu = pm.Normal("Average Campaign Effect", mu=0, sigma=3)
effectVar = pm.InverseGamma("Effect Variance", alpha=4, beta=3*.3**2)
effectSigma = pm.Deterministic("sd(Campaign Effect)", effectVar**.5)

# Locals probably overestimate mask use --
# people tend to overestimate small numbers
estimateBias = pm.Normal("Bias of Est", mu=.5, sigma=1)
estimateVar = pm.InverseGamma("Var(Est)", alpha=2, beta=.4**2)
estimateSigma = pm.Deterministic("Std Err(Est)", estimateVar**.5)
campaignBiasMu = pm.Normal("Bias from Campaign", mu=.2, sigma=1)
campaignBiasVar = pm.InverseGamma("Var(Bias from Campaign)",
                                  alpha=2, beta=2*.3**2)
campaignBiasSigma = pm.Deterministic("Std Err(Bias from Campaign)",
campaignBias = pm.Normal("Local Bias Caused by Campaign",
                         mu=campaignBiasMu, sigma=campaignBiasSigma,

constant = pm.Normal("Constants", mu, sigma, shape=numVil)
randomChange = pm.Normal("Time effects", timeMu, timeSigma, shape=numVil)
campaignEffect = pm.Normal("Campaign effects", effectMu, effectSigma,

localBias = pm.Normal("Local Bias in Estimates", mu = estimateBias,
                      sd = estimateSigma, shape=numVil)

maskRate = pm.Deterministic("Masking Rate (Logit)",
                    + randomChange[data["vilIndex"]] *
                    + campaignEffect[data["vilIndex"]]
                    * data["followup"].values
                    * data["campaign"].values

estimates = pm.LogitNormal("Local Estimates", mu=
    maskRate[data["vilIndex"]] + localBias[data["vilIndex"]] +
    campaignBias[data["vilIndex"]] * data["campaign"] * data["followup"],
    sd=estimateSigma, observed=data.wearsMaskEst)

counts = pm.Binomial("Counts", 550, pm.invlogit(maskRate), shape=len(data),

I’m guessing data["maskCount"] has missing values – but I don’t see any discrete distribution being used except for the Binomial which defines the likelihood. Is the issue that some of data.maskCount are missing? If the response is missing data, then these points aren’t really contributing to the conditional likelihood P[count|x] since their count is missing; so I’d recommend fitting only with complete data, and imputing using the posterior; i.e.,

counts = pm.Binomial('Counts', 550, pm.invlogit(maskRate)[compl_idx], shape=compl_idx.shape[0],

tr = pm.sample()  # should use NUTS
counts_imut = pm.sample_posterior_predictive(tr, ...)

Yes, some of data.maskCount is missing. I’ll try this, but if imputing the data makes no difference to the likelihood, why does PyMC3 impute it automatically instead of just not updating on it?

Trying to do this returns the following error:
TypeError: TensorType does not support Python bools for indexing, such as tensor[[True, False]]. To use a boolean mask, convert the mask to a NumPy array first, e.g., tensor[numpy.array([True, False])].

The index either needs to be integers, or a numpy array of bools.

Ahh, thanks, I misunderstood the error message and was trying to convert the data itself to a numpy array. The model runs and fits correctly; is there any reason to prefer imputation over just not updating on missing data?

I don’t see that there is anything to gain, and yet imputation adds a (potentially very large) number of dimensions, and in this case makes you lose the ability to use NUTS.

Thanks; in that case, why is automatic imputation the default?

Probably because most cases missing data appears among predictors, i.e., P(Y|X_1,X_2)P(X_1,X_2) and are missing some X_2 but not Y. In this case you can’t “just use the posterior” to predict X_2, because really what you want to predict is X_2|Y, so you need to impute the values from the full joint distribution.

Ahh, ok, so in this case it doesn’t help because the variable I’m interested in is latent (The actual percentage of people wearing masks)? Thanks!

Hmm, ok, so I’m expanding my model a bit to include other variables and now need to sample many discrete missing data points. Is there a way to impute these using SMC or some other technique that can sample more effectively than MH?