Assume the following model: y = b_0 + b_1 * x where we set some priors to b_0, b_1.

Let I denote our historical data and x^* denote future inputs.

Let p(b_0, b_1|I) denote our posteriors.

We can then define the posterior predictive distribution as p(y^*|I, x^*) = \int p(y^*|b_0, b_1, x^*)p(b_0, b_1|I).

Now my question is, why do we also sample the noise in the likelihood term p(y^*|b_0, b_1, x^*) and not just disregard it and just use the posterior samples to compute the desired quantity, does this procedure have a name?

This model is not complete as you have not specified the how the left and right sides are actually connected (e.g., the equality here seems to be false unless your data is extremely unrealistic). Typically, people slap on a ânoise termâ to this express to yield:

y=b0+b1âx + \epsilon

Sometimes, if they are particularly pedantic, they might specify the form of \epsilon:

\epsilon \sim N(0,\sigma)

But most people just leave this off and pretend like it doesnât exist. But it does! In a Bayesian context, itâs more conventional to write the same expression like this:

y \sim N(b0+b1âx, \sigma)

or perhaps something like

\mu = b0+b1âx \\
y \sim N(\mu, \sigma)

This makes it much clearer that y is a random variable and that the ânoiseâ is an intrinsic part of your model. Now when you go to generate posterior predictive samples, you can calculate \hat{\mu} and use it to generate one âregression lineâ per posterior predictive sample. Or you can generate a set of \hat{y} for each posterior predictive sample. Which do you choose?

There is no universal âdesired quantityâ when one does a posterior predictive check. The user/analyst needs to figure out what exactly is being âcheckedâ and generate quantities relevant for that goal.

So, to answer you question (I think), the procedure you are describing is just called a posterior predictive check. But here are many such checks. Check this notebook out for some examples.