Calculating conditional posterior predictive samples in high-dimensional observation spaces

I think GP would be the tool if my assumptions about spatial dependence were right, but they’re not :wink:

Thanks for sharing the model. The reason why I was asking so many questions about modeled dependencies between the detectors is because after the model is fit, new “observed” data doesn’t matter, unless you’re updating the estimated model parameters. If you’re just making a prediction, you need to have the information about the detectors somewhere in the model.

An illustration with linear regression. Suppose you have a univariate regression, y_i \sim N(\alpha + \beta \cdot x_i, \sigma). You provide y and X as data, and estimate posterior distributions for \alpha, \beta, \sigma. Now you can make predictions for unseen y_i using \hat y_i = \alpha + \beta \cdot x_i.

In your case, you have y_i \sim N(\mu, \Sigma), and no corresponding x_i. So for any unseen y_i, your estimate is the corresponding component of \mu. As an illustration, I use the pm.set_data API to do posterior prediction using different observed data values. Here is your model, adjusted to use labeled dimensions:

# 2. Construct PyMC model, sample posterior and construct posterior predictive
coords = {'detector':np.arange(ndetectors, dtype='int'),
          'detector_aux':np.arange(ndetectors, dtype='int')}

with pm.Model(coords=coords) as model:
    model.add_coord('time', np.arange(nsamples, dtype='int'), mutable=True)
    
    obs_data = pm.MutableData('obs_data', obs, dims=['time', 'detector'])
    sd_dist = pm.Exponential.dist(1/0.1)
    chol, corr, sigmas = pm.LKJCholeskyCov('chol', eta=1, n=ndetectors, sd_dist=sd_dist, compute_corr=True)
    
    # This will be useful later, maybe
    cov = pm.Deterministic('cov', chol @ chol.T, dims=['detector', 'detector_aux'])
    
    mu = pm.Exponential('mu',lam=1/obs_mean, dims=['detector'])
    D = pm.MvNormal('D',mu=mu, chol=chol, observed=obs_data, dims=['time', 'detector'])
    idata = pm.sample(idata_kwargs={"dims": {"chol_stds": ["detector"], "chol_corr": ["detector", "detector_aux"]}})

    pm.sample_posterior_predictive(idata, extend_inferencedata=True)

Now we can do posterior_predictive_sampling with any values of obs we desire. I set the value of all the detectors to one billion, for comedic effect:

with model:
    pm.set_data({'obs_data':np.array([[1e9, 1e9, 1e9]])}, coords={'time':[0]})
    missing_idata = pm.sample_posterior_predictive(idata)
    
fig, ax = plt.subplots(1, 3, figsize=(14,4), dpi=144)
for d in coords['detector']:
    missing_idata.posterior_predictive.D.stack(sample=['chain', 'draw']).sel(detector=d, time=0).plot.hist(bins=25, ax=fig.axes[d])

You can verify for yourself that you could plug anything into the data and you will get the same posterior predictive. This is because the posterior predictive is independent of the observed data, now that the model is fit.

I think all this might be missing the point, but I want to cover all my bases. What you might want to do is us the properties of the multivariate normal to predict the marginal distribution of the unobserved detector, conditioned on the observed. Specifically, if you have:

D = \begin{bmatrix} d_1 \\ d_2 \\ d_3 \end{bmatrix}

And suppose that d_1 is unobserved. So partition D into observed and unobserved parts: so D = \begin{bmatrix} d_u \\ d_o \end{bmatrix} (u for unobserved and o for observed). Similarly partition your estimated mu and sigma so that \mu = \begin{bmatrix} \mu_u \\ \mu_o \end{bmatrix} and \Sigma = \begin{bmatrix} \Sigma_{uu} & \Sigma_{uo} \\ \Sigma_{ou} & \Sigma_{oo} \end{bmatrix}, where each \Sigma_{xx} is a block of the estimated covariance matrix, it is known that:

d_u \sim N(\hat \mu_{u}, \hat \Sigma_{uu}), with

\begin{align} \hat \mu_{u} &= \mu_u + \beta (d_o - \mu_o)\\ \hat \Sigma_{uu} &= \Sigma_{uu} - \beta \Sigma_{oo} \beta^T \\ \beta &= \Sigma_{uo} \Sigma_{oo}^{-1} \end{align}

So all you have to do is use the elements of your posterior (not the posterior predictive!) to compute these quantities. Let me preface this next code block by saying that it’s probably not the most intuitive thing in the world if you’re not used to working with labeled dimension (which I’m not, it took me a bit of googling to figure it all out). The important thing is just that you implement those equations. For me, np.einsum would be a bit more intuitive, but I wanted to make an effort to use xarray.

import arviz as az
import xarray as xr

post = az.extract(idata, 'posterior')

# In the general case you need to sort these into observed and unobserved, 
# but I assume that the 0th element is unobserved and the others are observed.
Σ = post.cov
μ = post.mu

# Rename all the dims to make it (slightly) more obvious how the broadcasting works with xarray 
μ_u = μ.sel(detector=0).rename({'detector':'detector_unobs'})
μ_o = μ.sel(detector=[1,2]).rename({'detector':'detector_obs'})

Σ_uu = Σ.sel(detector=[0], detector_aux=[0]).rename({'detector':'detector_unobs', 'detector_aux':'detector_unobs_aux'})
Σ_ou = Σ.sel(detector=[1,2], detector_aux=[0]).rename({'detector':'detector_obs', 'detector_aux':'detector_unobs'})
Σ_uo = Σ.sel(detector=[0], detector_aux=[1,2]).rename({'detector':'detector_unobs', 'detector_aux':'detector_obs'})
Σ_oo = Σ.sel(detector=[1,2], detector_aux=[1,2]).rename({'detector':'detector_obs', 'detector_aux':'detector_obs_aux'})

# This is not at all obvious how to do, I followed this:
# https://github.com/pydata/xarray/discussions/7503
Σ_oo_inv = xr.apply_ufunc(np.linalg.inv,
                          Σ_oo, 
                        input_core_dims=[["detector_obs", "detector_obs_aux"]],
                        output_core_dims=[["detector_obs", "detector_obs_aux"]],
                        exclude_dims=set(("detector_obs", "detector_obs_aux")))

β = Σ_uo.dot(Σ_oo_inv, dims=['detector_obs']).rename({'detector_obs_aux':'detector_obs'})

new_obs = np.array([195, 166])[:, None]
μ_u_hat = μ_u + β.dot(new_obs - μ_o, dims=['detector_obs'])
Σ_u_hat = Σ_uu - β.dot(Σ_oo, dims=['detector_obs']).dot(β.rename({'detector_obs':'detector_obs_aux'}), dims=['detector_obs_aux'])


# Since the unobserved part is 1d, I use np.random.normal. In the general case, use np.random.multivariate_normal, and concatenate the results
d_u = np.array([np.random.normal(loc=mu, scale=sigma) for mu, sigma in zip(μ_u_hat.values.squeeze(), Σ_u_hat.values.squeeze())])
plt.hist(d_u, bins=100);

This gives you the following histogram:
image

Evidently, seeing the values 195 and 166 on detectors 1 and 2 cause the expected value of detector 0 to go down by about 3 units (compare this to the posterior predictive shown above). I would say that is expected, because the values 195 and 166 are both below the mean for detectors 1 and 2 (again, comapre to above), and looking at the (expected) covariance matrix, detector 0 has positive covariance with both 1 and 2. Thus, they should all go up or down together:

import seaborn as sns
sns.heatmap(post.cov.mean(dim=['sample']).to_dataframe().unstack('detector_aux')['cov'], annot=True)

image

Now, is all this necessary? By that I mean, “can’t PyMC just do this for us?”. Fair question, and I’m not sure. As @julien noted, this marginalization of unobserved components is exactly what Gaussian Processes do, but that’s more of a special case. I’m also not positive if this is conjugate updating of the posterior or something else. But I think it might be what you have in mind?

2 Likes