Im attempting to build ELO to model the “latent skill” of mma fighters. I’ve built a crude version just in python but I’d like to use random values for some of the hardcoded values as well as build in the idea of uncertainty. The part I can’t wrap my head around is how to build a model that works iteratively. Since the point of ELO is to award/subtract values from opponents at the time of the match I can’t figure out how to progress through the dataset of fights iteratively with the ELO rankings of the fighters at that moment rather than their “final” ELO ranking.
I’ve adapted the post about Modeling Time Series Data to fit my needs and it starts to sample but the run time gets >16 hours and it inevitably crashes.
My dataset is about 60,000 rows so am I just kinda stuck? Any insight to clean this up would be much appreciated.
df = fight_history_lean.copy()
df['team1_log_gamenum'] = np.log(df['total_fights_a']+ 1e-10)
df['team2_log_gamenum'] = np.log(df['total_fights_b']+ 1e-10)
max_log_gametime = np.log(df.fighter_a.value_counts() + df.fighter_b.value_counts())
max_log_gametime.index = max_log_gametime.index.astype('category')
df = df.astype({
'fighter_a': max_log_gametime.index.dtype,
'fighter_b': max_log_gametime.index.dtype})
n_fights = len(max_log_gametime)
with pm.Model() as model:
rating_mu_sd = pm.HalfNormal('rating_mu_sd', sigma=2, testval=2)
rating_mu = pm.Normal('rating_mu', sigma=rating_mu_sd, shape=n_fights, testval=np.zeros(n_fights))
sd = pm.HalfNormal('rating_time_sd', sigma=2, testval=2)
raw = pm.Normal('rating_time_raw', sigma=1, shape=n_fights, testval=np.zeros(n_fights))
rating_time = pm.Deterministic('rating_time', sd * raw)
rating_time1 = rating_time[df.fighter_a.cat.codes] * df['team1_log_gamenum']
rating_time2 = rating_time[df.fighter_b.cat.codes] * df['team2_log_gamenum']
theta = (
(rating_mu[df.fighter_a.cat.codes] + rating_time1)
- (rating_mu[df.fighter_b.cat.codes] + rating_time2))
max_rating = rating_mu + max_log_gametime.values.astype(float)
pm.Deterministic('max_elo', 1400 + 400 * max_rating / pm.math.log(10))
pm.Bernoulli('winner', logit_p=theta, observed=df['outcome'])
trace = pm.sample()```