How to correctly interpret the noise parameter 𝜎 σ in a Bayesian linear regression model when the predictors are standardized but the target variable is not.

anand · March 15, 2026, 5:13pm

Hi everyone,

I’m currently preparing to contribute to PyMC for GSoC 2026 and have been learning the modeling workflow using a Bayesian linear regression example (California housing dataset).

The model structure is roughly:

yᵢ ~ Normal(μᵢ, σ)
μᵢ = α + βXᵢ

One conceptual question I had concerns the interpretation of σ in the likelihood. In my workflow, the predictors X are standardized, while the target variable y is not.

In this case, should σ primarily be interpreted as observational noise in the target variable, or does it also absorb variation caused by omitted predictors or model misspecification?

Related to this, if the posterior for σ ends up relatively large, how should that be interpreted when evaluating the adequacy of the model (e.g., during posterior predictive checks)?

Any pointers or resources on how to reason about σ in practice when building more complex PyMC models would be greatly appreciated.

Thanks!

anand · March 17, 2026, 5:01am

Project Repo Link
I spent some more time exploring this and wanted to follow up with a concrete check.

From my current understanding:

Since X is standardized but y is not, σ remains in the scale of y and represents the typical deviation between observed y and μ.
However, it seems σ is not just observational noise — it also absorbs variability from omitted predictors and potential model misspecification.

To probe this, I looked at:

The posterior of σ (which is relatively wide / somewhat large)
Posterior predictive checks, where the spread of simulated y seems quite broad compared to the actual data

This makes me suspect that σ is compensating for model limitations rather than just noise.

As a next step, I’m planning to:

Try a Student-T likelihood to see if heavy tails explain part of this
Compare models using LOO to check if the fit improves

Does this seem like a reasonable way to interpret σ and diagnose the issue? Or would you recommend a different way to disentangle noise vs model misspecification in this setting?

bob-carpenter · March 30, 2026, 6:51pm

In a simple model like this, \sigma is accounting for all the noise, including noise in X and y.

You can add a measurement error model to directly model noise in X.

\sigma is the scale (i.e., standard deviation) of the residuals, which are the differences between the observations and predictions based on the regression, y_n - (\alpha + \beta \cdot x_n).

Topic		Replies	Views
When using normal likelihood,how to define sigma v5	1	450	December 18, 2023
Lack of convergence in a super simple toy model Questions	3	508	July 17, 2021
Observational error in logistic regression v5	6	418	March 2, 2023
Sampling error y inf when adding sigma heteroskedastic modelling v5	11	110	November 15, 2024
How to deal with heteroscedasticity in GLM regression v5 modeling	1	616	February 2, 2024

How to correctly interpret the noise parameter 𝜎 σ in a Bayesian linear regression model when the predictors are standardized but the target variable is not.

Related topics