Automatic Imputation of Multivariate models

Imohiuddin · December 17, 2022, 10:31pm

I am trying to work with automatic imputation in a multivariate model and am seeing that while there previously used to be multivariate imputation, there is currently not.

For example Using this notebook: [pymcon] Missing value handling.ipynb · GitHub


loc = np.array([-2., 4.])
scale = np.array([2., 3.])
corr = np.array([[1., .65],
                 [.65, 1.]])
cov = np.outer(scale, scale) * corr
chol = np.linalg.cholesky(cov)

n = 500
sample = np.einsum(
    "ij,...j->...i",
    chol, np.random.randn(n, *loc.shape)
    ) + loc

n_mask = 50
row_idx = np.random.choice(n, n_mask, replace=False)
col_idx = np.random.binomial(1, .5, size=n_mask)
mask = np.zeros(sample.shape)
mask[row_idx, col_idx] = np.nan
sample_masked = np.ma.array(sample, mask=np.isnan(mask))

masked_element_idx = np.stack(np.where(sample_masked.mask))

with pm.Model() as m4:
    mu = pm.Normal('mu', mu=0, sigma=10, shape=2)
    sd_dist = pm.HalfCauchy.dist(2.5)
    chol_cov, *_ = pm.LKJCholeskyCov(
        'chol_cov', n=2, eta=1, sd_dist=sd_dist,
        compute_corr=True, store_in_trace=False)
    obs = pm.MvNormal(
        'obs', mu=mu, chol=chol_cov, observed=sample_masked)
    trace4 = pm.sample(2000, cores=4, return_inferencedata=True)

I am getting the current error:
NotImplementedError: Automatic inputation is only supported for univariate RandomVariables, but obs is multivariate .

I am currently running pymc 5.0.0.

Furthermore, if imputation from multivariate random variables has been removed in pymc, I was wondering how I would try and deal with this

junpenglao · December 19, 2022, 4:15pm

cc @ricardoV94

ricardoV94 · December 19, 2022, 7:31pm

We don’t have an API in place yet, but you can do it easy enough with a Potential

import pymc as pm
import numpy as np

import aesara.tensor as at

data = np.zeros((3,2))
data[[0, 1, 2, 2], [0, 1, 0, 1]] = np.nan
print(data)

with pm.Model() as m:
    # Create a vector of flat variables for the unobserved components of the MvNormal
    x_unobs = pm.Flat("x_unobs", shape=(np.isnan(data).sum(),))

    # Create the symbolic value of x, combining observed data and unobserved variables
    x = at.as_tensor(data)
    x = pm.Deterministic("x", at.set_subtensor(x[np.isnan(data)], x_unobs))

    # Add a Potential with the logp of the variable conditioned on `x`
    pm.Potential("x_logp", pm.logp(rv=pm.MvNormal.dist([0, 0], [[1,0],[0,1]]), value=x))
    idata = pm.sample()

Imohiuddin · December 19, 2022, 9:55pm

Thank you I was able to implement this into my code and get the solution that I was expecting!

I am not so familiar with working with pm.Flat and pm.Potential and would really like to learn more about them if you have anything you could direct me to

ricardoV94 · December 19, 2022, 9:57pm

Flat is just a way to sample a variable without an intrinsic prior, and Potential is a way to directly add a log-probability term to the model.

Topic		Replies	Views
Automatic imputation - array dimension problem Questions	2	667	February 10, 2022
Multivariate Normal Imputation Error v5 aesara	1	527	December 19, 2022
Multivariate normal with missing data imputation operands could not be broadcast together with shapes (29,2) () (29,) Questions theano	12	1858	September 7, 2020
Partial Missing Multivariate Observation and What to Do With Them by Junpeng Lao PyMCon2020	4	1719	October 31, 2020
Dealing with imputation of left censored multivariate normal distribution Questions	7	1587	June 20, 2018

Automatic Imputation of Multivariate models

Related topics