How to correctly interpret the noise parameter 𝜎 σ in a Bayesian linear regression model when the predictors are standardized but the target variable is not.

Hi everyone,

I’m currently preparing to contribute to PyMC for GSoC 2026 and have been learning the modeling workflow using a Bayesian linear regression example (California housing dataset).

The model structure is roughly:

yᵢ ~ Normal(μᵢ, σ)
μᵢ = α + βXᵢ

One conceptual question I had concerns the interpretation of σ in the likelihood. In my workflow, the predictors X are standardized, while the target variable y is not.

In this case, should σ primarily be interpreted as observational noise in the target variable, or does it also absorb variation caused by omitted predictors or model misspecification?

Related to this, if the posterior for σ ends up relatively large, how should that be interpreted when evaluating the adequacy of the model (e.g., during posterior predictive checks)?

Any pointers or resources on how to reason about σ in practice when building more complex PyMC models would be greatly appreciated.

Thanks!

Project Repo Link
I spent some more time exploring this and wanted to follow up with a concrete check.

From my current understanding:

  • Since X is standardized but y is not, σ remains in the scale of y and represents the typical deviation between observed y and μ.

  • However, it seems σ is not just observational noise — it also absorbs variability from omitted predictors and potential model misspecification.

To probe this, I looked at:

  • The posterior of σ (which is relatively wide / somewhat large)

  • Posterior predictive checks, where the spread of simulated y seems quite broad compared to the actual data

This makes me suspect that σ is compensating for model limitations rather than just noise.

As a next step, I’m planning to:

  • Try a Student-T likelihood to see if heavy tails explain part of this

  • Compare models using LOO to check if the fit improves

Does this seem like a reasonable way to interpret σ and diagnose the issue? Or would you recommend a different way to disentangle noise vs model misspecification in this setting?

In a simple model like this, \sigma is accounting for all the noise, including noise in X and y.

You can add a measurement error model to directly model noise in X.

\sigma is the scale (i.e., standard deviation) of the residuals, which are the differences between the observations and predictions based on the regression, y_n - (\alpha + \beta \cdot x_n).