Hi, folks
I’m new to PyMC and Bayesian modeling/analysis.
I have been trying to use PyMC to run an experiment in which I compare the results from two machine learning models. Specifically, I’m interested in comparing them in terms of their F1 scores (whose values range from 0 to 1 – and I get from running 10 fold cross validation on both models):
So, to get things going and start making modeling choices, I simulate experiment data as follows:
def estimate_beta_params(mu, var):
alpha = ((1 - mu) / var - 1 / mu) * mu**2
beta = alpha * (1/ mu - 1)
return (alpha, beta)
estimate_beta_params(0.75, 0.1)
results_from_model_1 = np.random.beta(a=0.65, b=0.22, size=120)
results_from_model_2 = np.random.beta(a=0.65, b=0.22, size=120)
However, as I mentioned, I’m new to Bayesian modeling in general, so I’m not sure how to fully specify a probabilistic model for the aforementioned comparison.
So far, I decide to apply Beta priors on the means because their values range from 0 to 1. Additionally, as a simplifying assumption, I assume that the mean of both groups is 0.75 (and variance is 0.1) (see method above).
with pm.Model() as model:
# given that the values for the means (i.e., F1) range from 0 to 1,
# apply Beta priors on them
# also, _arbitrarily_ set the hyperparameters to mu = 0.75 and variance = 0.1
# (see estimate_beta_params method)
model1_mean = pm.Beta("model 1 mean", alpha = 0.65, beta = 0.22)
model2_mean = pm.Beta("model 2 mean", alpha = 0.65, beta = 0.22)
After going over some of the named distributions, I am still not sure how to specify likelihoods. Any suggestions?