[Beginner level question on modeling] Bayesian analysis of F1 scores from two ML models

Can you provide some sketch of what the data is? Is the data the F1 sores themselves? And what’s the question you are trying to answer?