Imputation for partially observed random variables with positive support

npschafer · February 25, 2021, 5:38pm

I’m looking to do imputation of partially observed random variables with positive support. Everything seems to be going fine for distributions that have support on the entire real line, like the normal distribution. Here’s the unobserved case:

import pymc3 as pm
from numpy.ma import masked_values

with pm.Model() as model:
    pm.Normal("unobserved", mu=0., sigma=1, shape=(2,))
    map_estimate = pm.find_MAP()

print(map_estimate)
print(model.test_point)

{'unobserved': array([0., 0.])}
{'unobserved': array([0., 0.])}

And the partially observed case:

import pymc3 as pm
from numpy.ma import masked_values

MASK_VALUE = -999.
observed_values = np.array([1., MASK_VALUE], dtype=float)
with pm.Model() as model:
    pm.Normal("partially_observed", mu=0, sigma=1, observed=masked_values(observed_values, value=MASK_VALUE), shape=(2,))
    map_estimate = pm.find_MAP()
print(map_estimate)
print(model.test_point)

{'partially_observed_missing': array([0.])}
{'partially_observed_missing': array([0.])}

The unobserved gamma distribution case also seems fine:

import pymc3 as pm
from numpy.ma import masked_values

with pm.Model() as model:
    pm.Gamma("unobserved", alpha=3., beta=1/0.1, shape=(2,))
    map_estimate = pm.find_MAP()
print(map_estimate)
print(model.test_point)

{'unobserved_log__': array([-1.60943927, -1.60943927]), 'unobserved': array([0.19999973, 0.19999973])}
{'unobserved_log__': array([-1.2039728, -1.2039728])}

But the partially observed gamma distribution case converges to a point that is outside of the support of the distribution:

import pymc3 as pm
from numpy.ma import masked_values

MASK_VALUE = -999.
observed_values = np.array([1., MASK_VALUE], dtype=float)
with pm.Model() as model:
    pm.Gamma("partially_observed", alpha=3., beta=1/0.1, observed=masked_values(observed_values, value=MASK_VALUE))
    map_estimate = pm.find_MAP()
print(map_estimate)
print(model.test_point)

{'partially_observed_missing': array([-0.7])}
{'partially_observed_missing': array([0.3])}

I think it might be this same issue that is causing variational inference to fail. Any ideas about what the underlying issue is? All help is very much appreciated.

ckrapu · February 26, 2021, 7:10pm

It looks like this is due to the fact that support-constrained variables are usually not sampled on their original scale. I reproduced your issue with lognormal and beta-distributed random variates as well. It would be worth opening an issue on the PyMC3 Github for this.

Topic		Replies	Views
Simple imputation difficulties Questions	3	545	July 12, 2021
Handling missing values in predictor when outcome is a Multivariate Normal distribution v5	7	105	October 25, 2024
Dealing with missing data and custom distribution Questions	13	2182	March 14, 2021
Beta distribution failing for missing value imputation? Questions	7	647	December 21, 2021
Partial Missing Multivariate Observation and What to Do With Them by Junpeng Lao PyMCon2020	4	1727	October 31, 2020

Imputation for partially observed random variables with positive support

Related topics