I have built a bayesian linear regression model using pymc3. On evalation, I found that the MAPE results (mean absolute percentage error) on training set is more the test set. what can be the reason for this as the same variable means are used for test set?
This is hard to answer without more information. Have you tried other error metrics like RMSE, that aren’t scaled as a percentage? Have you tried randomly selecting a new training/test split? If n is small you might of just got lucky and happened to have a test set that was predicted well by the linear model. This isn’t impossible in theory, just unlucky as n --> larger. If n is large and RMSE also shows better results on training and test and a new random test/training split it might just be model misspecification.
Thanks. I checked MAE_train, MAE_test, which are 0.0523275, 0.0571307 respectively and RMSE_train and RMSE_test which are 0.066204, 0.0608991 respectively. The test samples are very less, so I should do cross validation and check the results.
Hmm. I would try another random splitting of train and test. And rerun the model.