Hello,
Thank you for your reply and apologies for taking so much time to get back to you, I ve been sick last week.
Regarding your questions.
Are you maybe trying to extrapolate to predict the sklearn model’s residuals for data that is outside the range already covered by the available data?
Yes that is one of the goals. I know there are issues with generalization, but even a limited one would help.
in Bayesian inference we don’t do train/test splitting. Always fit the PyMC model to all available data at the same time. The Bayesian model formulation captures uncertainty already and also extrapolates if that’s what you are interested in. To judge how good a model is you let the model predict/generate new data (posterior predictive) and compare this Predictive distribution with the data. The residuals between the data and the Predictive distribution tell you about lack of fit errors of the model.
Hm, Interesting observation. I 'll give this some thought. However for our use case we are assuming that “test” data come without targets so we won’t be able to use them to train residuals. We do want to estimate however what the distribution of residuals is likely to be (at those test data).
Think of the “sigma” parameter of the likelihood as the measurement error. In your model this is sigma=e^mu
, so you assume that your positive residuals have exponentially large measurement errors, while negative residuals are extremely accurate? You probably forgot to do a at.abs()
, but even then I would advise to start with a constant sigma
.
Thank you for those suggestions, I 'll give them a try. The reason I added sigma = e^mu
is that I has getting errors at initialization and training (sometimes, not consistently). I wasn’t sure how to debug them but I noticed that If I used a functional expression that sigma couldn’t be estimated as negative no matter the inputs then those errors went away. I have tried many approaches to not have sigma be negative and the one in the notebook worked best. I knew there were drawbacks but even there you clarified them better than I realized, thanks.
P.S. I think I tried adding abs
at sigma
but there was a problem. I 'll have to check my notes, and get back to you on that.
If you’re trying to build a model to extrapolate the error of the sklearn model, to be able to predict with uncertainty how biased the sklearn model will be on new data, then what assumptions do you have about this extrapolation? Remember that this measure is the mu
variable going into the likelihood. Is a linear model a good model for this?
-
The key assumption is that there is no concept drift.
-
Moreover a linear model is likely not the best, but I am using it as a starting point. Given how the dataset I am using is generated (as detailed in the first notebook) a second order polynomial could be the best option, if the model managed to learn the true relationship of the target and the data. But when I tried a second order model I got way worse results (and way bigger training times).
Overall this has been a learning exercise. I 'll try to see if I can set up the dependence of sigma
in a simpler way without the exponent but also without getting errors during training. Any suggestions on this front are welcome.