I think GP would be the tool if my assumptions about spatial dependence were right, but theyāre not
Thanks for sharing the model. The reason why I was asking so many questions about modeled dependencies between the detectors is because after the model is fit, new āobservedā data doesnāt matter, unless youāre updating the estimated model parameters. If youāre just making a prediction, you need to have the information about the detectors somewhere in the model.
An illustration with linear regression. Suppose you have a univariate regression, y_i \sim N(\alpha + \beta \cdot x_i, \sigma). You provide y and X as data, and estimate posterior distributions for \alpha, \beta, \sigma. Now you can make predictions for unseen y_i using \hat y_i = \alpha + \beta \cdot x_i.
In your case, you have y_i \sim N(\mu, \Sigma), and no corresponding x_i. So for any unseen y_i, your estimate is the corresponding component of \mu. As an illustration, I use the pm.set_data
API to do posterior prediction using different observed data values. Here is your model, adjusted to use labeled dimensions:
# 2. Construct PyMC model, sample posterior and construct posterior predictive
coords = {'detector':np.arange(ndetectors, dtype='int'),
'detector_aux':np.arange(ndetectors, dtype='int')}
with pm.Model(coords=coords) as model:
model.add_coord('time', np.arange(nsamples, dtype='int'), mutable=True)
obs_data = pm.MutableData('obs_data', obs, dims=['time', 'detector'])
sd_dist = pm.Exponential.dist(1/0.1)
chol, corr, sigmas = pm.LKJCholeskyCov('chol', eta=1, n=ndetectors, sd_dist=sd_dist, compute_corr=True)
# This will be useful later, maybe
cov = pm.Deterministic('cov', chol @ chol.T, dims=['detector', 'detector_aux'])
mu = pm.Exponential('mu',lam=1/obs_mean, dims=['detector'])
D = pm.MvNormal('D',mu=mu, chol=chol, observed=obs_data, dims=['time', 'detector'])
idata = pm.sample(idata_kwargs={"dims": {"chol_stds": ["detector"], "chol_corr": ["detector", "detector_aux"]}})
pm.sample_posterior_predictive(idata, extend_inferencedata=True)
Now we can do posterior_predictive_sampling with any values of obs
we desire. I set the value of all the detectors to one billion, for comedic effect:
with model:
pm.set_data({'obs_data':np.array([[1e9, 1e9, 1e9]])}, coords={'time':[0]})
missing_idata = pm.sample_posterior_predictive(idata)
fig, ax = plt.subplots(1, 3, figsize=(14,4), dpi=144)
for d in coords['detector']:
missing_idata.posterior_predictive.D.stack(sample=['chain', 'draw']).sel(detector=d, time=0).plot.hist(bins=25, ax=fig.axes[d])
You can verify for yourself that you could plug anything into the data and you will get the same posterior predictive. This is because the posterior predictive is independent of the observed data, now that the model is fit.
I think all this might be missing the point, but I want to cover all my bases. What you might want to do is us the properties of the multivariate normal to predict the marginal distribution of the unobserved detector, conditioned on the observed. Specifically, if you have:
D = \begin{bmatrix} d_1 \\ d_2 \\ d_3 \end{bmatrix}
And suppose that d_1 is unobserved. So partition D into observed and unobserved parts: so D = \begin{bmatrix} d_u \\ d_o \end{bmatrix} (u for unobserved and o for observed). Similarly partition your estimated mu and sigma so that \mu = \begin{bmatrix} \mu_u \\ \mu_o \end{bmatrix} and \Sigma = \begin{bmatrix} \Sigma_{uu} & \Sigma_{uo} \\ \Sigma_{ou} & \Sigma_{oo} \end{bmatrix}, where each \Sigma_{xx} is a block of the estimated covariance matrix, it is known that:
d_u \sim N(\hat \mu_{u}, \hat \Sigma_{uu}), with
\begin{align} \hat \mu_{u} &= \mu_u + \beta (d_o - \mu_o)\\
\hat \Sigma_{uu} &= \Sigma_{uu} - \beta \Sigma_{oo} \beta^T \\
\beta &= \Sigma_{uo} \Sigma_{oo}^{-1} \end{align}
So all you have to do is use the elements of your posterior (not the posterior predictive!) to compute these quantities. Let me preface this next code block by saying that itās probably not the most intuitive thing in the world if youāre not used to working with labeled dimension (which Iām not, it took me a bit of googling to figure it all out). The important thing is just that you implement those equations. For me, np.einsum
would be a bit more intuitive, but I wanted to make an effort to use xarray
.
import arviz as az
import xarray as xr
post = az.extract(idata, 'posterior')
# In the general case you need to sort these into observed and unobserved,
# but I assume that the 0th element is unobserved and the others are observed.
Ī£ = post.cov
Ī¼ = post.mu
# Rename all the dims to make it (slightly) more obvious how the broadcasting works with xarray
Ī¼_u = Ī¼.sel(detector=0).rename({'detector':'detector_unobs'})
Ī¼_o = Ī¼.sel(detector=[1,2]).rename({'detector':'detector_obs'})
Ī£_uu = Ī£.sel(detector=[0], detector_aux=[0]).rename({'detector':'detector_unobs', 'detector_aux':'detector_unobs_aux'})
Ī£_ou = Ī£.sel(detector=[1,2], detector_aux=[0]).rename({'detector':'detector_obs', 'detector_aux':'detector_unobs'})
Ī£_uo = Ī£.sel(detector=[0], detector_aux=[1,2]).rename({'detector':'detector_unobs', 'detector_aux':'detector_obs'})
Ī£_oo = Ī£.sel(detector=[1,2], detector_aux=[1,2]).rename({'detector':'detector_obs', 'detector_aux':'detector_obs_aux'})
# This is not at all obvious how to do, I followed this:
# https://github.com/pydata/xarray/discussions/7503
Ī£_oo_inv = xr.apply_ufunc(np.linalg.inv,
Ī£_oo,
input_core_dims=[["detector_obs", "detector_obs_aux"]],
output_core_dims=[["detector_obs", "detector_obs_aux"]],
exclude_dims=set(("detector_obs", "detector_obs_aux")))
Ī² = Ī£_uo.dot(Ī£_oo_inv, dims=['detector_obs']).rename({'detector_obs_aux':'detector_obs'})
new_obs = np.array([195, 166])[:, None]
Ī¼_u_hat = Ī¼_u + Ī².dot(new_obs - Ī¼_o, dims=['detector_obs'])
Ī£_u_hat = Ī£_uu - Ī².dot(Ī£_oo, dims=['detector_obs']).dot(Ī².rename({'detector_obs':'detector_obs_aux'}), dims=['detector_obs_aux'])
# Since the unobserved part is 1d, I use np.random.normal. In the general case, use np.random.multivariate_normal, and concatenate the results
d_u = np.array([np.random.normal(loc=mu, scale=sigma) for mu, sigma in zip(Ī¼_u_hat.values.squeeze(), Ī£_u_hat.values.squeeze())])
plt.hist(d_u, bins=100);
This gives you the following histogram:
Evidently, seeing the values 195 and 166 on detectors 1 and 2 cause the expected value of detector 0 to go down by about 3 units (compare this to the posterior predictive shown above). I would say that is expected, because the values 195 and 166 are both below the mean for detectors 1 and 2 (again, comapre to above), and looking at the (expected) covariance matrix, detector 0 has positive covariance with both 1 and 2. Thus, they should all go up or down together:
import seaborn as sns
sns.heatmap(post.cov.mean(dim=['sample']).to_dataframe().unstack('detector_aux')['cov'], annot=True)
Now, is all this necessary? By that I mean, ācanāt PyMC just do this for us?ā. Fair question, and Iām not sure. As @julien noted, this marginalization of unobserved components is exactly what Gaussian Processes do, but thatās more of a special case. Iām also not positive if this is conjugate updating of the posterior or something else. But I think it might be what you have in mind?