Why is the noise included in the posterior predictive distribution in bayesian regression?

jensen · January 25, 2024, 9:48pm

Assume the following model: y = b_0 + b_1 * x where we set some priors to b_0, b_1.

Let I denote our historical data and x^* denote future inputs.

Let p(b_0, b_1|I) denote our posteriors.

We can then define the posterior predictive distribution as p(y^*|I, x^*) = \int p(y^*|b_0, b_1, x^*)p(b_0, b_1|I).

Now my question is, why do we also sample the noise in the likelihood term p(y^*|b_0, b_1, x^*) and not just disregard it and just use the posterior samples to compute the desired quantity, does this procedure have a name?

cluhmann · January 26, 2024, 3:06pm

Welcome!

This model is not complete as you have not specified the how the left and right sides are actually connected (e.g., the equality here seems to be false unless your data is extremely unrealistic). Typically, people slap on a “noise term” to this express to yield:

y=b0+b1∗x + \epsilon

Sometimes, if they are particularly pedantic, they might specify the form of \epsilon:

\epsilon \sim N(0,\sigma)

But most people just leave this off and pretend like it doesn’t exist. But it does! In a Bayesian context, it’s more conventional to write the same expression like this:

y \sim N(b0+b1∗x, \sigma)

or perhaps something like

\mu = b0+b1∗x \\ y \sim N(\mu, \sigma)

This makes it much clearer that y is a random variable and that the “noise” is an intrinsic part of your model. Now when you go to generate posterior predictive samples, you can calculate \hat{\mu} and use it to generate one “regression line” per posterior predictive sample. Or you can generate a set of \hat{y} for each posterior predictive sample. Which do you choose?

There is no universal “desired quantity” when one does a posterior predictive check. The user/analyst needs to figure out what exactly is being “checked” and generate quantities relevant for that goal.

So, to answer you question (I think), the procedure you are describing is just called a posterior predictive check. But here are many such checks. Check this notebook out for some examples.

Topic		Replies	Views
Interpreting uncertainty of posterior predictives v5	2	322	December 4, 2023
Posterior predictive distribution Questions	6	3465	May 3, 2018
Posterior predictive sampling with data variance Questions	10	2394	September 14, 2018
Silly question: 'zigzag' curves while making inference version agnostic arviz	3	497	April 8, 2020
Could somebody provide a minimal example for sample_posterior_predictive() Questions	2	418	April 15, 2021

Why is the noise included in the posterior predictive distribution in bayesian regression?

Related topics