On Aleatoric and Epistemic Uncertainty Decomposition

I have been looking at this post
I would like to understand the aleatoric and epistemic decomposition bit.
I reproduce here the code bit for convenience

import numpy as np
import pymc as pm

size = 200
true_intercept = 1
true_slope = 2

x = np.linspace(0, 1, size)
true_regression_line = true_intercept + true_slope * x
# add noise
y = true_regression_line + np.random.normal(scale=.5, size=size)

with pm.Model() as model: 
    
    sigma = pm.HalfCauchy('sigma', beta=10, testval=1.)
    intercept = pm.Normal('Intercept', 0, sigma=20)
    x_coeff = pm.Normal('x', 0, sigma=20)
    mu_likelihood = pm.Deterministic("mu_likelihood", intercept + x_coeff * x)
    likelihood = pm.Normal('y', mu=mu_likelihood,
                        sigma=sigma, observed=y)

    
    trace = pm.sample(3000, cores=2) 

The answer suggested wrapping mu_likelihood in pm. deterministic to have it stored in the posterior trace.
The first question is: how exactly would one use mu_likelihood to decompose total uncertainty in aleatoric and epistemic?

I would think the law of total variance is the way to go. So I could get posterior samples,

  sampled_lines = trace.posterior["y_model"] = idata.posterior["Intercept"] + idata.posterior["slope"] * xr.DataArray(x)
        tot_var = np.sqrt(samples.mean(axis=0).var(axis=0) +samples.var(axis=0).mean(axis=0))
        lower, upper = mu_samples - 2 * tot_var, mu_samples + 2 * tot_var

where the first term (“mean of the variances”) refers to aleatoric, while the second refers to epistemic uncertanty. This approach is used in the (https://www.pymc.io/projects/examples/en/latest/gaussian_processes/GP-Heteroskedastic.html) tutorial.

In this regression case this does not work I think, as samples.mean(axis=0) computes the mean over diffeent traces, and the next var(axis=0) computes the variance over different x realisations, which is not what one wants.
In the Gaussian Process example above, the law of total variance can be applied because for each point x, i have both an estimate of the mean and the variance per each trace, is it not the case?

So, how could epistemic and aleatoric uncertainty decomposed for the case above, using mu_likelihood? i do not have an estimate of the noise, nor local variance, how to extract it from mu_likelihood alone.

I can see how one can integrate “over all the uncertainty in the model can be obtained by samping from the posterior predictive distribution”, quoting a user in the post above, but is the aleatoric and epistemic decomposition interests me.

I appreciate this is not strictly speaking a PyMC question, would be grateful anyhow if somebody gave me a hint, thank you very much

Suppose I have a simple binomial distribution and a simple beta prior,

\theta \sim \text{beta}(2, 2)

y_n \sim \text{bernoulli}(\theta)

If I observe some data y, I derive a posterior p(\theta \mid y) and then when I want to do posterior predictive inference for new items I get this:

\displaystyle p(\tilde{y} \mid y) = \int_{\mathbb{R}} p(\tilde{y} \mid \theta) \cdot p(\theta \mid y) \, \text{d}\theta.

There are two forms of uncertainty, the uncertainty in parameters represented by the posterior p(\theta \mid y) and uncertainty in the binary outcome given by p(\tilde{y} \mid \theta) = \text{bernoulli}(\tilde{y} \mid\theta).

The uncertainty from the posterior is a kind of epistemic uncertainty, whereas the uncertainty from the sampling distribution of \tilde{y} is kind of aleatoric uncertainty. The aleatoric uncertainty is irreducible—even if we know \theta exactly it doesn’t go away. The epistemic uncertainty can be reduced by collecting a larger data set y, which will sharpen the posterior around \theta.

To actually generate draws for \tilde{y}, we have to do this:

\tilde{y}^{(n)} \sim \text{bernoulli}(\theta^{(n)}) for \theta^{(n)} \sim p(\theta \mid y).

That is, we first simulate a parameter value \theta^{(n)} from the posterior, then we simulate a draw \tilde{y}^{(n)} from the sampling distribution. We can use MCMC to do the first and then just a random number generator for the second.

2 Likes