Posterior Predictive Checks with an 'error-in-variables' model

Hi there,

I have a conceptual query regarding the ‘error-in-variables’ approach. Is it possible to perform meaningful posterior predictive checks with the observed noisy predictor data (let’s call that xobs)? My understanding, given the underlying process is a function of the latent, unobserved, true predictor (call this xt), is that this is at least complicated with the methods available in PyMC3 (I’m using version 3.11.2), if not impossible.

If I take the example posterior predictive check example, then I note that the checks are performed using the scaled observed predictor data xobs (as expected). But, in the ‘error-in-variables’ approach, a set of xobs is sampled from the posterior for each sample when running sample_posterior_predictive. So, the plotting mu_pp.mean (see example) and plot_hdi against the original observed data xobs does not seem relevant, given each realisation of mu_pp is generated based on a sampled set of x_obs. Is there a principled way to approach this?

Regards,
Jack

Preamble: PyMC3 is legacy software that is not maintained. This reply assumes you go get the latest version, which is 5.16.2 at the time of writing.

You can add measurement error to covariates by assigning them distributions, and using the (noisy) data as observations of these distributions. Here’s an example:

import pymc as pm
import numpy as np
import arviz as az

with pm.Model() as error_model:    
    
    mu_x = pm.Normal('mu_x', sigma=100)
    sigma_x = pm.Exponential('sigma_x', 0.1)
    x = pm.Normal('x', mu=mu_x, sigma=sigma_x, size=(100,))

    alpha = pm.Normal('alpha', sigma=100)
    beta = pm.Normal('beta', sigma=10)
    mu = pm.Deterministic('mu', alpha + beta * x)    

    sigma = pm.Exponential('sigma', 1)
    obs = pm.Normal('obs', mu=mu, sigma=sigma)

Use this model to take some prior draws, and choose one to use as data, and modify the model to include the observations:

with error_model:
    prior = pm.sample_prior_predictive()

draw_idx = np.random.choice(500)
true_data = prior.prior.isel(chain=0, draw=draw_idx)
model = pm.observe(error_model, {x:true_data.x, obs:true_data.obs})
model.to_graphviz()

Here’s what you end up with. Notice that uncertainty from x flows down into obs

Sample the model and make some plots:

with model:
    idata = pm.sample()
    idata = pm.sample_posterior_predictive(idata, extend_inferencedata=True)
    
var_names = ['mu_x', 'sigma_x', 'alpha', 'beta']
ref_val = np.r_[[true_data[x].item() for x in var_names]].tolist()
az.plot_posterior(idata, var_names=var_names, ref_val=ref_val)
az.plot_ppc(idata, var_names=['obs', 'x'])


The posterior predictive draws for obs are computed conditional on the draws of x, combining uncertainty from both. You can verify this by comparing the outputs to a second model that directly uses the x data:

with pm.Model() as fixed_x_model:
    alpha = pm.Flat('alpha')
    beta = pm.Flat('beta')
    mu = pm.Deterministic('mu', alpha + beta * true_data.x.values)
    
    sigma = pm.Flat('sigma')
    obs = pm.Normal('obs', mu=mu, sigma=sigma, observed=true_data.obs.values)
    fixed_x_idata = pm.sample_posterior_predictive(idata, extend_inferencedata=False, var_names=['mu', 'obs'])

Plot results:

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
az.plot_ppc(idata, var_names=['obs'], ax=ax, mean=False, observed=False, legend=False, alpha=0.1)
az.plot_ppc(fixed_x_idata, var_names=['obs'], ax=ax, color='tab:orange', mean=False, observed=False, legend=False, alpha=0.1)

Blue includes uncertainty on x, orange does not
Untitled

1 Like

Good morning Jesse,

Thank you for your quick response, it is much appreciated!

I’ve a couple of queries to get my head around this approach.

  1. If I interpret the model graph correctly the underlying true value mu is a function of the noisy observation of x, and the parameters alpha & beta function on the noisy x. The example I’ve looked at for an ‘error-in-variables’ approach from this post is slightly, but importantly, different. alpha & beta function on the true, underlying variable mu_x. The uncertainty in x flows through to obs through the draws of mu_x taken for each sample. Is there a particular reason why the model you’ve shown is set-up for mu = f(x) rather than mu = f(mu_x)?
  2. From that starting point, I think I follow the process. The draws on obs being conditional on x (or, in the model structure I’m assuming, mu_x) makes sense. But, and this is where I’m struggling conceptually, if we then want to use a call to plot_hdi (such as is shown in the example ppc page, what values do we plot the hdi against? It seems to me it should be against mu_x, but I’m not sure which mu_x (an average of realisations?)

Thanks again.

Regards,
Jack

mu_x is your prior value on the mean of the distribution that generates x. It is a scalar value. If you use mu_x in your definition of mu = alpha + beta * mu_x, you will not have any variation in the x dimension to identify beta. It is possible, however, that this is what you want, but in this case I conceptually don’t know what you are expecting from the model.

You cannot “plot against” mu_x, because there is only one value of mu_x. If you wanted to plots HDIs in this case, I would first make a scatterplot of the true data at positions (x_true, y_true). Then do a forestplot showing the hdi in the y-dimension vertically, and the hdi in the x-dimension horizontally.

Thanks Jesse. Apologies, I think I’ve confused matters here by mixing nomenclature between your model and the example I’m following.

I’ll refer directly to the model structure that is used in the ‘error-in-variables’ post (see link in my previous post). See below the model structure. If I’m not mistaken, this is different to the set-up in your example.

image

Then, the model is implemented as below

with pm.Model() as model:
    err_odr = pm.HalfNormal('err_odr', 5.)
    err_param = pm.HalfNormal('err_param', 5.)
    x_lat = pm.Normal('x_mu', 0, 5., shape=x.shape[0])
    x_obs = pm.Normal('x_obs', mu=x_lat, sd=err_odr, observed=x, shape=x.shape[0])
    offset = pm.Normal('offset', 0, err_param)
    slope = pm.Normal('slope', 0, err_param)
    y_pred = pm.Deterministic('y_pred', offset + slope * x_lat)
    y_obs = pm.Normal('y', mu=y_pred, sd=err_odr, observed=y)
    
    trace = pm.sample(800, tune=1500, chains=4, cores=3, init='jitter+adapt_diag')

So, we have some observed data x and y. x is assumed to be a set of noisy observations of a true underlying latent variable x_mu. y is assumed to a set of noisy observations of the true underlying variable y_pred. And y_pred is assumed to be a function of the underlying latent variable x_mu.

When the model is sampled, for each sample a set of x_mu are drawn.

This is where I’m stuck. To the suggestion of a scatterplot of true data as the basis for the HDI - this would need to be the mean of samples of x_mu, and of the resultant y_pred. And then the HDI’s of x_obs and y_obs on top. I’m going to give this a go.

For posterior predictive checks, you’re usually generating replicates of the modeled data, not the unmodeled data like covariates. So in a regression of (dependent) y on (independent) x, the posterior predictive checks would keep x fixed and generate replicates of y. The fact that x has measurement error doesn’t change this.

I’m pretty sure that’s what @jessegrabowski meant by

Notice that uncertainty from x flows down into obs

Yes, you just marginalize those out by ignoring them, the same way you “ignore” the other parameters. In math, you have some noisy x and some noisy y and some parameters, which we’ll collectively call theta. What you’re trying to compute with posterior predictive checks for y is this:

\displaystyle p(y^{\textrm{sim}} \mid y) = \int_{\Theta} p(y^\textrm{sim} \mid x, \theta) \cdot p(\theta \mid x, y) \, \textrm{d} \theta..

The x_\mu are treated like parameters and just get margianlized out like all the other elements of \theta.

1 Like

Good morning Bob,

Thank you for your response. The concept of marginalizing the sample x_mu values makes sense.

Regards,
Jack