Wow! That was really helpful. ![]()
Honestly, I got carried away by the prospect of getting a better explanation for the difference in performance data that I didn’t really think about the fact that I was trying to model the wrong part of the problem. ![]()
I assumed that as long as I managed to come up with decent priors (not based on the values in the data, but simply on what I know about the data beforehand, that is, on what the F1 scores tend to be like) the posterior distribution would take the form of the probability of the parameters conditional on the data (F1 scores in observed) and so I’d end up getting a better representation of what the F1 scores from these models look like. So, basically I thought I could come up with a better answer by betting on reasonable priors and the model conditioning to the results.