How Can I Impute Missing Discrete Data Efficiently?

Lime · April 7, 2021, 7:04pm

I have a hierarchical logistic regression with some missing data. Currently, I’m trying to just use the normal :
> with maskingModel:

trace = pm.sample(60000, chains=6, cores=6, tune=30000,
                  nuts={'target_accept':.99},
                  init="jitter+advi+adapt_diag")

This returns the following:
> >NUTS: [Local Bias in Estimates, Campaign effects, Time effects, Constants, Local Bias Caused by Campaign, Var(Bias from Campaign), Bias from Campaign, Var(Est), Bias of Est, Effect Variance, Average Campaign Effect, Var(Time Effect), Average Time Effect, Var, Mean]

>Metropolis: [Counts_missing]
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 14 seconds.
There were 50 divergences after tuning. Increase `target_accept` or reparameterize.
There were 65 divergences after tuning. Increase `target_accept` or reparameterize.
There were 24 divergences after tuning. Increase `target_accept` or reparameterize.
There were 96 divergences after tuning. Increase `target_accept` or reparameterize.
The rhat statistic is larger than 1.05 for some parameters. This indicates slight problems during sampling.

So it looks like the data imputation didn’t go well. How can I change the number of samples and get rid of the divergences? Are there samplers other than Metropolis that can be used here? I think particle Gibbs should be doable here, but I don’t know whether/how PyMC3 implements this.

chartl · April 8, 2021, 5:00am

Are you sure the issue is the missing data imputation? Just as a test, you can perform an independent pre-imputation (say by column median) just to see if the model samples without missing data. It would also be helpful to see the model – or at least a description.

jonsedar · April 8, 2021, 10:23am

Not sure about discrete observations, but FWIW I’ve a worked example up here that works just fine, maybe there’s an aspect of that model that can help influence your thoughts MRE_missing_values · GitHub

I’d suggest using a hierarchical prior on the missing data value to help constrain the discrete sampling to a reasonable value

Lime · April 9, 2021, 3:42am

Yep – when I drop missing values the code works perfectly. The problem just seems to be that Metropolis-Hastings has difficulty with this.

with pm.Model() as maskingModel:

# Hyperpriors
# Few people use masks in Cameroon, from correspondents
mu = pm.Normal("Mean", mu=-4.5, sigma=.5)
var = pm.InverseGamma("Var", alpha=4, beta=3 * .3**2)
sigma = pm.Deterministic("sd", var**.5)

# Informative priors/regularizations on variance --
# the intervention should be similarly effective in all villages.
timeMu = pm.Normal("Average Time Effect", mu=.6, sigma=.3)
timeVar = pm.InverseGamma("Var(Time Effect)", alpha=4, beta=3*.3**2)
timeSigma = pm.Deterministic("sd(Time Effect)", timeVar**.5)

# Regularizing prior:
effectMu = pm.Normal("Average Campaign Effect", mu=0, sigma=3)
effectVar = pm.InverseGamma("Effect Variance", alpha=4, beta=3*.3**2)
effectSigma = pm.Deterministic("sd(Campaign Effect)", effectVar**.5)

# Locals probably overestimate mask use --
# people tend to overestimate small numbers
estimateBias = pm.Normal("Bias of Est", mu=.5, sigma=1)
estimateVar = pm.InverseGamma("Var(Est)", alpha=2, beta=.4**2)
estimateSigma = pm.Deterministic("Std Err(Est)", estimateVar**.5)
campaignBiasMu = pm.Normal("Bias from Campaign", mu=.2, sigma=1)
campaignBiasVar = pm.InverseGamma("Var(Bias from Campaign)",
                                  alpha=2, beta=2*.3**2)
campaignBiasSigma = pm.Deterministic("Std Err(Bias from Campaign)",
                                     campaignBiasVar**.5)
campaignBias = pm.Normal("Local Bias Caused by Campaign",
                         mu=campaignBiasMu, sigma=campaignBiasSigma,
                         shape=numVil)

constant = pm.Normal("Constants", mu, sigma, shape=numVil)
randomChange = pm.Normal("Time effects", timeMu, timeSigma, shape=numVil)
campaignEffect = pm.Normal("Campaign effects", effectMu, effectSigma,
                           shape=numVil
)

localBias = pm.Normal("Local Bias in Estimates", mu = estimateBias,
                      sd = estimateSigma, shape=numVil)


maskRate = pm.Deterministic("Masking Rate (Logit)",
                    constant[data["vilIndex"]]
                    + randomChange[data["vilIndex"]] *
                    data["followup"].values
                    + campaignEffect[data["vilIndex"]]
                    * data["followup"].values
                    * data["campaign"].values
                    )


estimates = pm.LogitNormal("Local Estimates", mu=
    maskRate[data["vilIndex"]] + localBias[data["vilIndex"]] +
    campaignBias[data["vilIndex"]] * data["campaign"] * data["followup"],
    sd=estimateSigma, observed=data.wearsMaskEst)



counts = pm.Binomial("Counts", 550, pm.invlogit(maskRate), shape=len(data),
                observed=data.maskCount)

chartl · April 9, 2021, 3:58pm

I’m guessing data["maskCount"] has missing values – but I don’t see any discrete distribution being used except for the Binomial which defines the likelihood. Is the issue that some of data.maskCount are missing? If the response is missing data, then these points aren’t really contributing to the conditional likelihood P[count|x] since their count is missing; so I’d recommend fitting only with complete data, and imputing using the posterior; i.e.,

counts = pm.Binomial('Counts', 550, pm.invlogit(maskRate)[compl_idx], shape=compl_idx.shape[0],
                                     observed=data.maskCount[compl_idx])

tr = pm.sample()  # should use NUTS
counts_imut = pm.sample_posterior_predictive(tr, ...)

Lime · April 10, 2021, 6:16pm

Yes, some of data.maskCount is missing. I’ll try this, but if imputing the data makes no difference to the likelihood, why does PyMC3 impute it automatically instead of just not updating on it?

Lime · April 10, 2021, 6:52pm

Trying to do this returns the following error:
TypeError: TensorType does not support Python bools for indexing, such as tensor[[True, False]]. To use a boolean mask, convert the mask to a NumPy array first, e.g., tensor[numpy.array([True, False])].

chartl · April 10, 2021, 6:53pm

The index either needs to be integers, or a numpy array of bools.

Lime · April 10, 2021, 7:28pm

Ahh, thanks, I misunderstood the error message and was trying to convert the data itself to a numpy array. The model runs and fits correctly; is there any reason to prefer imputation over just not updating on missing data?

chartl · April 10, 2021, 7:37pm

I don’t see that there is anything to gain, and yet imputation adds a (potentially very large) number of dimensions, and in this case makes you lose the ability to use NUTS.

Lime · April 10, 2021, 10:15pm

Thanks; in that case, why is automatic imputation the default?

chartl · April 10, 2021, 11:41pm

Probably because most cases missing data appears among predictors, i.e., P(Y|X_1,X_2)P(X_1,X_2) and are missing some X_2 but not Y. In this case you can’t “just use the posterior” to predict X_2, because really what you want to predict is X_2|Y, so you need to impute the values from the full joint distribution.

Lime · April 11, 2021, 12:01am

Ahh, ok, so in this case it doesn’t help because the variable I’m interested in is latent (The actual percentage of people wearing masks)? Thanks!

Lime · April 13, 2021, 12:31am

Hmm, ok, so I’m expanding my model a bit to include other variables and now need to sample many discrete missing data points. Is there a way to impute these using SMC or some other technique that can sample more effectively than MH?

Topic		Replies	Views
Disabling missing data imputation Questions	17	2200	October 10, 2023
Dealing with missing data and custom distribution Questions	13	2182	March 14, 2021
Can I avoid sampling irrelevant bits of the distribution? Questions	1	470	February 2, 2020
Automatic imputation and posterior predictive sampling with new data Questions	5	2027	October 29, 2020
What does the hierarchical model look like when having missing in observed? Questions	12	1114	October 31, 2018

How Can I Impute Missing Discrete Data Efficiently?

Related topics