Help for building model

DIlyess · April 12, 2024, 6:08pm

Hi, I am really new to pymc and I tried to build a multivariate linear model with my beta coefficient being horeshoe prior with the following code and I could not manage to lower my errors to improve model’s performance, what would you suggest me to try to have better results ?

with pm.Model(coords={"predictors": X_train.columns.values}) as model:

    X = pm.Data("X", X_train, dims=('obs', 'predictors'))
    y = pm.Data("y", y_train, dims=('obs'))

    # Prior on error SD
    sigma = pm.Normal("sigma", 2300, 10)

    # Global shrinkage prior
    tau = pm.HalfStudentT("tau", 10, D0 / (D - D0) * sigma / np.sqrt(N))
    # Local shrinkage prior
    lam = pm.HalfStudentT("lam", 5, dims="predictors")
    c2 = pm.InverseGamma("c2", 5, 1)
    z = pm.Normal("z", 25, 10, dims="predictors")
    # Shrunken coefficients
    beta = pm.Deterministic("beta", z * tau * lam * at.sqrt(c2 / (c2 + tau**2 * lam**2)), dims="predictors")
    # No shrinkage on intercept
    beta0 = pm.Normal("beta0", 100, 25.0)

    scores = pm.Normal("scores", beta0 + at.dot(X, beta), sigma, observed=y, dims=('obs',))

Training :

with model:
    idata = pm.sample(draws=1000, tune=2000, chains=4, random_seed=42)

Using test observation to evaluate the model :

with model:
    pm.set_data({"X": X_test})
    posterior_test = pm.sample_posterior_predictive(idata, predictions=True, extend_inferencedata=False)

At the end my model has an r2 score of 0.72 and the predictions seems to follow the trend of my y_test but my RMSE is way to high (14000 for median sample at 30000). I know it depends a lot on my data but I am trying here to explore more bayesian modeling to improve accuracy.

About the data : I didn’t scale any data and I have both binary and continuous features (690 samples, 866 features)

Thanks !

Topic		Replies	Views
Observed and Simulated data difference and prediction of unseen data v5	0	308	August 2, 2023
How to model a multivariate multilevel linear model Questions	6	646	September 11, 2019
Why doesn't this pymc3 model show shrinkage? Questions	6	1032	March 23, 2020
Multivariate hierarchical linear model with different hyperpriors Questions	2	718	November 2, 2018
Feedback on my first hierarchical bayesian model Questions	3	478	April 20, 2020

Help for building model

Related topics