ELO Rankings and Iterative PYMC

AndrewKillion · January 24, 2024, 4:36pm

Im attempting to build ELO to model the “latent skill” of mma fighters. I’ve built a crude version just in python but I’d like to use random values for some of the hardcoded values as well as build in the idea of uncertainty. The part I can’t wrap my head around is how to build a model that works iteratively. Since the point of ELO is to award/subtract values from opponents at the time of the match I can’t figure out how to progress through the dataset of fights iteratively with the ELO rankings of the fighters at that moment rather than their “final” ELO ranking.

I’ve adapted the post about Modeling Time Series Data to fit my needs and it starts to sample but the run time gets >16 hours and it inevitably crashes.

My dataset is about 60,000 rows so am I just kinda stuck? Any insight to clean this up would be much appreciated.

df = fight_history_lean.copy()

df['team1_log_gamenum'] = np.log(df['total_fights_a']+ 1e-10)
df['team2_log_gamenum'] = np.log(df['total_fights_b']+ 1e-10) 
max_log_gametime = np.log(df.fighter_a.value_counts() + df.fighter_b.value_counts())

max_log_gametime.index = max_log_gametime.index.astype('category')
df = df.astype({
    'fighter_a': max_log_gametime.index.dtype,
    'fighter_b': max_log_gametime.index.dtype})
n_fights = len(max_log_gametime)


with pm.Model() as model:
    rating_mu_sd = pm.HalfNormal('rating_mu_sd', sigma=2, testval=2)
    rating_mu = pm.Normal('rating_mu', sigma=rating_mu_sd, shape=n_fights, testval=np.zeros(n_fights))

    sd = pm.HalfNormal('rating_time_sd', sigma=2, testval=2)
    raw = pm.Normal('rating_time_raw', sigma=1, shape=n_fights, testval=np.zeros(n_fights))
    rating_time = pm.Deterministic('rating_time', sd * raw)

    rating_time1 = rating_time[df.fighter_a.cat.codes] * df['team1_log_gamenum']
    rating_time2 = rating_time[df.fighter_b.cat.codes] * df['team2_log_gamenum']
    theta = (
        (rating_mu[df.fighter_a.cat.codes] + rating_time1)
        - (rating_mu[df.fighter_b.cat.codes] + rating_time2))

    max_rating = rating_mu + max_log_gametime.values.astype(float)
    pm.Deterministic('max_elo', 1400 + 400 * max_rating / pm.math.log(10))
    pm.Bernoulli('winner', logit_p=theta, observed=df['outcome'])
    
    trace = pm.sample()```

jaharvey8 · January 24, 2024, 7:09pm

This seems relevant Seeking Guidance on Modeling ELO Scores in PyMC

Topic		Replies	Views
Seeking Guidance on Modeling ELO Scores in PyMC v5 modeling	2	349	December 18, 2023
Mixed hierarchical model to predict MMA fights version agnostic	4	468	May 3, 2023
Model compilation difficulties with medium-size dataset Questions	4	1153	October 4, 2017
Tournament Skill Estimator, some modelling challenges Questions	7	1302	February 15, 2019
Trueskill model in pymc3 Questions	1	1468	June 17, 2019

ELO Rankings and Iterative PYMC

Related topics