Multiple Linear Regression model

Hello All,
I am trying to fit a Multiple Linear Regression Model with 2 predictor variables.

The best-fit parameters returned by pymc3 are significantly different from the true values (taken from the literature).

The data (as a csv) file is here: exp.csv (54.9 KB)

The notebook with the code and a few diagnostic plots can be found here:

What can I do to find a better-fitting model?

Can’t run your script due to missing files.

But the first thought I had was that you have set the parameters sigma’s to 100, meaning that you think the values might fall between -300 & 300, try narrowing it.

Or, standardize your inputs and set the parameters to mu = 0, sigma = 1.

Edit
Actually, I ran the ols function of scipy:

``````from statsmodels.formula.api import ols

ols_fit = ols('y ~ x1 + x2', df).fit()
ols_fit.summary()

OLS Regression Results
==============================================================================
Dep. Variable:                      y   R-squared:                       0.212
Method:                 Least Squares   F-statistic:                     161.4
Date:                Fri, 11 Dec 2020   Prob (F-statistic):           8.42e-63
Time:                        09:14:42   Log-Likelihood:                -411.91
No. Observations:                1203   AIC:                             829.8
Df Residuals:                    1200   BIC:                             845.1
Df Model:                           2
Covariance Type:            nonrobust
==============================================================================
coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -1.5132      0.010   -153.822      0.000      -1.532      -1.494
x1            -2.9102      0.171    -17.061      0.000      -3.245      -2.576
x2             0.3921      0.029     13.741      0.000       0.336       0.448
==============================================================================
Omnibus:                       44.879   Durbin-Watson:                   1.860
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              106.670
Skew:                          -0.163   Prob(JB):                     6.87e-24
Kurtosis:                       4.422   Cond. No.                         17.4
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
``````

The parameters here are close to what your PyMC3 model generates.
So my question is, how did you arrive at the true values for the parameters?
The data suggests something else.