Hi community
I’m trying to find a way to measure the pure noise produced by the model itself, assuming we have ideal observations (i.e., noise-free data). The key idea is to understand the model’s contribution to the overall uncertainty. Are there any ways to investigate this? What metrics would be appropriate? For example, in the attached image, you can see such noise.
Thank you in advance
I’m not sure what you mean by noise produced by the model. Do you mean the posterior uncertainty you get from having only observed a sample of the data? Or do you mean underlying model variation? For example, if I have a linear regression with data-generating process y_n \sim \textrm{normal}(\alpha + \beta \cdot x_n, \sigma), then there will be two sources of posterior uncertainty, (1) the uncertainty from not knowing \alpha, \beta, \sigma, and (2) the uncertainty from the process itself having error scale \sigma. Even when you know \alpha, \beta, \sigma exactly, the second type of uncertainty never goes away.
If you’re concerned about the first type of noise, it’s governed by the central limit theorem in well-behaved cases—as you take more observations, error in estimating \alpha, \beta, \sigma will go down as \mathcal{O}(1 / \sqrt{N}) with N observations, with a constant determined by the posterior standard deviation.