Bayesian prudence or basic uncertainty management

Hi all!

TL;DR

When I’m working with a model that I know is too simple (either because I’m at the first stages of an iterative process or because of external constraints) I would like to incorporate my belief in the model inaccuracy somehow in the estimations. Something like widening credible intervals to account for that distrust.

How would you go about it? I think this is not part of standard practice, perhaps because it can easily slip into arbitrary decision making, but there may be more to it. What’s your take?

Toy example

Let’s say I’m modelling delays, and I’m working with the simplest model possible, a fixed scale exponential. I’m quite certain that the scale is actually time varying, but I don’t have the time or computational resources (imagine it’s not just a exponential but a much more complex model to begin with) to handle that.

For simplicity, in the example below delays come from just two different scales, but I’m using a single scale to model them. The idea: the model will be wrong, but not more wrong than having no model at all, so it makes sense to deploy it first and take advantage of its value while better approximations are developed.

Model code:

import numpy as np
import pymc as pm
import matplotlib.pyplot as plt

plt.style.use('ggplot')
np.random.seed(42)

scale1 = 1
scale2 = 5
samples1 = np.random.exponential(scale1, 500)
samples2 = np.random.exponential(scale2, 500)
all_samples = np.concatenate([samples1, samples2])

with pm.Model() as model:
    scale = pm.HalfNormal("scale", 10)
    obs = pm.Exponential('obs', scale=scale, observed=all_samples)
    trace = pm.sample()

Scale posterior:

pm.plot_posterior(trace, var_names=["scale"])

The true scales are 1 and 5, while the model is quite sure that the scale is between 2.5-3. To me this is perfectly fine, the model claim is not absolute but relative, that is, it’s conditional on the parametrisation being right. “If this is the data generating process, then the scale parameter must be 2.5-3.2”. The core of my question has to do with reframing the condition that bases this statement (spoiler: I want probabilities conditional on my overall beliefs, not just on the model assumptions). This is more easily seen looking at the posterior predictive.

with model:
    posterior = pm.sample_posterior_predictive(trace)

ax = pm.plot_ppc(posterior, kind="cumulative", num_pp_samples=100)
cdf_at_10 = (posterior.observed_data.sortby("obs") < 10).mean()
ax.axvline(10, linestyle="--")
ax = ax.axhline(cdf_at_10.to_array().item(), linestyle="--")

Let’s say I’m concerned with the probability of delays less than 10 (let’s say minutes). Once again, the model is quite confident the probability is between 0.94-0.96; when it’s actually 0.93. I see no problem with this, the probabilities are conditional on the model assumptions being right, which they are not.

However, I want to use the model in production nonetheless because it’s kind of useful. I’m not worried about predictions being biased, but I don’t want predictions to express overconfidence because they fail to take into account the inaccuracy introduced by simplification. I’m not interested in predictions based on just the validity of the model’s assumptions, but also on my belief with respect to such validity.

In summary, I don’t trust the model I made and I want to reflect it somehow in the predictions I get. I think this should imply wider credible intervals in predictions.

I think, historically, prudence, as one of the cardinal virtues, was the art of incorporating disbelief with respect to one’s own assumptions in one’s decisions. I’m looking for a way to make this more explicit within a Bayesian framework.

Regards,
Juan.

2 Likes

If you want to stick to a Bayesian approach, you don’t. The posterior inferences are determined by the model and the data. Instead, when the model is too simple, you change the model. In the Bayesian world, you are looking for the literature on calibration (starting with Dawid and ending with Gneiting et al.).

By way of contrast, in ML, usually both the model and the algorithm to fit the model are in play. Rarely will an ML approach fit the actual model they specify—there’s almost always implicit priors (or what an ML researcher would call “inductive biases”). It sounds like what you’re looking for is what the ML folks call “conformal prediction.” That lets you take a miscalibrated ML output and try to calibrate it with empirical data.

Jessical Hullman has been blogging about calibration and conformal prediction on Andrew Gelman’s blog.

3 Likes

Thanks for your input Bob. I’ve made a small edit to my question because you’ve made me realise my use of the word calibration was imprecise/misleading.

I’m okay with probabilities not matching empirical data. As I see it, if there’s any value in priors, I would actually like probabilities to be miscalibrated (calibration now used in the “conformal prediction” sense).

What I’m concerned with is that I get predictive distributions that are too narrow (and this appraisal could be based just on domain expertise, not necessarily data) because they are conditional on a set of assumptions that I know don’t exactly hold. So I want to incorporate this knowledge to add uncertainty in the predictions.

The edit I made:
I want confidence in predictions to be well calibrated.
I don’t want predictions to express overconfidence because they fail to take into account the inaccuracy introduced by simplification.

I don’t want predictions to express overconfidence because they fail to take into account the inaccuracy introduced by simplification.

I understood that—it’s a common problem we wrestle with all the time.

How do you propose to measure whether or not you are “expressing overconfidence” other than evaluating the match to empirical data? The definition of “calibration” is just the right amount of uncertainty.

Clarke and Yao just wrote a nice paper, “A Cheat Sheet for Bayesian Prediction” in Statistical Science and their notion of “predictive Bayes” (this is what I was saying wasn’t even Bayes—there’s no official definition of “Bayesian”). They also talk about stacking multiple misspecified models, variants of which have been the go-to method for winning prediction contests for the last few decades. You can even look at something like boosted decision trees as a kind of stacking of simpler models.

The key phrase you want for this in the Bayesian literature is “model misspecification”. There are various ways to adjust misspecified models to make them more accurate. Conformal prediction is only one of those. I don’t know the literature well enough to provide much more in the way of suggestions for what to explore next.

2 Likes

Thanks again for your input Bob, and apologies for the misunderstanding, I jumped on the “conformal prediction” thing and lost track of the rest. Also thanks for the reference to Jessica Hullman, I’ve been enjoying reading her posts.

To me this would be somewhat analogous to evaluating priors. You contrast the model’s prior predictions with your own understanding of where prior predictions should lie. Here you contrast model’s confidence in predictions with your own understanding of how confident those predictions should be given the model’s limitations.

I like the definition, to me this is consistent with the fact that uncertainty is also conditional on prior understanding of the problem.

Fair point. I guess to me what’s specific about “Bayesian” in this context is the understanding of probability as a quantification of uncertainty, which need not be based solely on data. But I see there are many different views on this.

Thanks! That’s definitely key. I like this presentation of misspecification I’ve found:

where by misspecified we mean that the analyst is unwilling to act as if the model is correct.

Also I found this which aligns pretty well with my question with respect to intentional misspecification:

Here we assume that a decision has been made to work with a misspecified model for reasons of practicality. For example, a more realistic model might require information that is difficult or expensive to obtain or might be too complex to interpret easily.

I still have to go through the literature to come back with something concrete, it looks like it’s definitely not straightforward but there are ways to go about it. One of them is called “SafeBayes” (I like the name because is aligned with the prudential approach of erring on the side of caution) which

takes into account model misspecification to replace the likelihood with a downgraded version expressed as a power of the original likelihood.

Perhaps this is what you meant originally when you said that you have to change the model somehow. I guess I understood that as “adding complexity to reduce misspecification” and didn’t find it satisfying, but now I see the point.

Models can be misspecified for many reasons. Common ones are assuming symmetry where there isn’t symmetry, assuming normal tails when there are wide tails, assuming Poisson dispersion when it’s really more overdispersed, using priors that concentrate mass in the wrong volumes when applied jointly, not taking into account various forms of measurement error, not interacting covariates where there is strong heterogeneity, assuming homogeneous errors in a time series that’s heteroskedastic, etc.

You can also take away complexity or just change things that leave it no more or less complex (e.g., swapping in a Student-t error model for a normal error model).