Aleatoric and Epistemic uncertainty

Hello

I am trying to derive the aleatoric and epistemic uncertainty from a pymc3 model and I’m not sure how to do this.

For example given this model (taken from glm-linear):

import numpy as np
import pymc3 as pm

size = 200
true_intercept = 1
true_slope = 2

x = np.linspace(0, 1, size)
true_regression_line = true_intercept + true_slope * x
# add noise
y = true_regression_line + np.random.normal(scale=.5, size=size)

with pm.Model() as model: 
    
    sigma = pm.HalfCauchy('sigma', beta=10, testval=1.)
    intercept = pm.Normal('Intercept', 0, sigma=20)
    x_coeff = pm.Normal('x', 0, sigma=20)
    mu_likelihood = intercept + x_coeff * x
    likelihood = pm.Normal('y', mu=mu_likelihood,
                        sigma=sigma, observed=y)

    
    trace = pm.sample(3000, cores=2) 

I would like to know the uncertainty over mu_likelihood variable (epistemic uncertainty) as opposed to the uncertainty just over the likelihood variable (aleatoric uncertainty). As an aside, the motivation for this is to split out the uncertainty that the model has about its own predictions from the uncertainty in the system that is being modelled.

I’ve seen these being done with TensorFlow Probability and it would be good to know if something similar can be done with the pymc3 framework.

Thanks for any help!

Hi,
IIUC what you’re asking, you can get the uncertainty around mu_likelihood by wraping it in a deterministic variable: mu_likelihood = pm.Deterministic("mu_likelihood", intercept + x_coeff * x). Then, PyMC will store it in the posterior trace.
The uncertainty integrating over all the uncertainty in the model can be obtained by samping from the posterior predictive distribution, thus simulating new data (y here).

Just wanna point out that there is an interesting video about aleatoric and epistemic uncertainty by Aki Vehtari here, and that @RavinKumar is also gonna mention this in his upcoming PyData global talk :wink:
Hope this helps :vulcan_salute:

4 Likes

Ah thank you so much @AlexAndorra - that is exactly what I was looking for! And thanks for the resource as well.

1 Like

For completeness - you could produce the aleatoric uncertainty by taking the mean of mu_likelihood and sigma for each data sample (here we’d draw 2000 samples from it):
np.random.normal(trace['mu_likelihood'].mean(axis=0), trace['sigma'].mean(), size=(2000,size))