Do we need a testing set?

Thank you both for your replies I think this clarifies a lot! Haha yeah the moment you said the statement in the presentation you could feel the room go quiet!

So let me try to summarize what I understand. Given that we are certain about our model is the right one to represent our data (your point 1.), since the frequentist model doesn’t carry uncertainty throughout the model and through its predictions this allows the model to overfit the data by learning something that minimize some cost function as much as possible without taking into account the uncertainty.

If we compare that to a bayesian model where uncertainty is kept through all the process (our parameters are actually probability distributions that best represent our data), the model will take that uncertainty into account for the predictions also and won’t overfit the data. The probability distributions of the parameters account for the uncertainty in the data and thus can’t really overfit the data.

But, this doesn’t remove the need to test the performance of the model on unseen data if we want to use it for prediction.

Thanks a lot for your time and I will definitely watch your next presentations if they are on youtube!

5 Likes