Multiple Linear Regression model

HNLala · November 12, 2020, 11:35am

Hello All,
I am trying to fit a Multiple Linear Regression Model with 2 predictor variables.

The best-fit parameters returned by pymc3 are significantly different from the true values (taken from the literature).

The data (as a csv) file is here: exp.csv (54.9 KB)

The notebook with the code and a few diagnostic plots can be found here:

What can I do to find a better-fitting model?

mattiasthalen · December 11, 2020, 6:40am

Can’t run your script due to missing files.

But the first thought I had was that you have set the parameters sigma’s to 100, meaning that you think the values might fall between -300 & 300, try narrowing it.

Or, standardize your inputs and set the parameters to mu = 0, sigma = 1.

Edit
Actually, I ran the ols function of scipy:

from statsmodels.formula.api import ols

ols_fit = ols('y ~ x1 + x2', df).fit()
ols_fit.summary()

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.212
Model:                            OLS   Adj. R-squared:                  0.211
Method:                 Least Squares   F-statistic:                     161.4
Date:                Fri, 11 Dec 2020   Prob (F-statistic):           8.42e-63
Time:                        09:14:42   Log-Likelihood:                -411.91
No. Observations:                1203   AIC:                             829.8
Df Residuals:                    1200   BIC:                             845.1
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -1.5132      0.010   -153.822      0.000      -1.532      -1.494
x1            -2.9102      0.171    -17.061      0.000      -3.245      -2.576
x2             0.3921      0.029     13.741      0.000       0.336       0.448
==============================================================================
Omnibus:                       44.879   Durbin-Watson:                   1.860
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              106.670
Skew:                          -0.163   Prob(JB):                     6.87e-24
Kurtosis:                       4.422   Cond. No.                         17.4
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The parameters here are close to what your PyMC3 model generates.
So my question is, how did you arrive at the true values for the parameters?
The data suggests something else.

Topic		Replies	Views
Multiple Linear bayesian regression: Improve posterior predictive distribution Questions	3	793	March 5, 2020
Multiple regression caveats Questions	6	853	July 23, 2018
Doubts on linear model definition for curve fitting Questions	1	1242	July 28, 2018
Measurement Uncertainties in Multiple Regression Questions	3	1457	April 28, 2020
Bayesian regression in PyMC3 Questions	1	398	October 16, 2019

Multiple Linear Regression model

Related topics