Mixed hierarchical model to predict MMA fights

Extremely new to modeling and pymc. But I’m trying to blend different categories to predict mma fight outcomes (decision, knockout, submission). Things seems to work pretty well when Im using Dirichlet to describe characteristics like “style” and “camp.” But now I want to add a feature for the fighters “record” . Their records are normalized over their fights and stored in a (,3) arrary (so a fighter with 2 decisions, 2 kos, and 1 dec is : [.4, .4, .2]. I just keep getting infinitesimally small log_probs for initial values. any help (and advice elsewhere would be much much appreciated)

with pm.Model(coords=coords) as hierarchical_model:
    #Data Inputs
    weightclasses = pm.MutableData("weightclasses", weightclass_idx)
    a_camp = pm.MutableData("a fight camps", fighter_a_Fighter_Camp_idx)
    b_camp = pm.MutableData("b fight camps", fighter_b_Fighter_Camp_idx)
    a_style = pm.MutableData("a fight styles", fighter_a_Primary_Fight_Style_idx)
    b_style = pm.MutableData("b fight styles", fighter_b_Primary_Fight_Style_idx)
    a_record = pm.MutableData('a records', fighter_a_record_clean)
    b_record = pm.MutableData('b records', fighter_b_record_clean)

    weightclass_hyperprior = pm.Dirichlet('weightclass_hyperprior', a=np.array([1, 1, 1]))
    camp_hyperprior = pm.Dirichlet('camp_hyperprior', a=np.array([1, 1, 1]))    
    style_hyperprior = pm.Dirichlet('style_hyperprior', a=np.array([1, 1, 1]))
    record_hyperprior = pm.Dirichlet('record_hyperprior', a=np.array([1, 1, 1]))

    # Priors for weight class-level features
    prior_weightclass = pm.Dirichlet("p_outcome_prior_weightclass", a=weightclass_hyperprior, dims=('weightclass', 'fight outcomes'))
    prior_a_camp = pm.Dirichlet("p_outcome_prior_a_camp", a=camp_hyperprior, dims=('fight camps', 'fight outcomes'))
    prior_b_camp = pm.Dirichlet("p_outcome_prior_b_camp", a=camp_hyperprior, dims=('fight camps', 'fight outcomes'))
    prior_a_style = pm.Dirichlet("p_outcome_prior_a_style", a=style_hyperprior, dims=('fight styles', 'fight outcomes'))
    prior_b_style = pm.Dirichlet("p_outcome_prior_b_style", a=style_hyperprior, dims=('fight styles', 'fight outcomes'))
    prior_a_record = pm.Dirichlet("p_outcome_prior_a_record", a=record_hyperprior, dims=('records', 'fight outcomes'))
    prior_b_record = pm.Dirichlet("p_outcome_prior_b_record", a=record_hyperprior, dims=('records', 'fight outcomes'))
    prior_a_record_coeff = pm.Normal('a_record_coeff', mu=0, sigma=5, shape=(3,))
    prior_b_record_coeff = pm.Normal('b_record_coeff', mu=0, sigma=5, shape=(3,))

    # Select the probabilities based on weight class
    outcome_weightclass = prior_weightclass[weightclasses]
    outcome_a_camp = prior_a_camp[a_camp]
    outcome_b_camp = prior_b_camp[b_camp]
    outcome_a_style = prior_a_style[a_style]
    outcome_b_style = prior_b_style[b_style]
    outcome_a_record = pm.math.dot(a_record, prior_a_record_coeff)
    outcome_b_record = pm.math.dot(b_record, prior_b_record_coeff)

    log_p_outcome_weightclass = pm.math.log(outcome_weightclass)
    log_p_outcome_a_camp = pm.math.log(outcome_a_camp)
    log_p_outcome_b_camp = pm.math.log(outcome_b_camp)
    log_p_outcome_a_style = pm.math.log(outcome_a_style)
    log_p_outcome_b_style = pm.math.log(outcome_b_style)
    log_p_outcome_a_record = pm.math.log(a_record)
    log_p_outcome_b_record = pm.math.log(b_record)

    # Combine log-probabilities
    log_combined_prob = (
        + log_p_outcome_a_camp
        + log_p_outcome_b_camp
        + log_p_outcome_a_style
        + log_p_outcome_b_style
        + log_p_outcome_a_record
        + log_p_outcome_b_record

    # Convert back to probabilities and normalize
    combined_prob = pm.math.softmax(log_combined_prob, axis=-1)

    # Likelihood for the first model
    obs = pm.Categorical("obs", p=combined_prob, observed=fight_outcome_idx)

    trace = pm.sample()


It’s not entirely clear to me how you are approaching this (right or wrong), but you might want to check out @Martin_Ingram 's blog post (here) that involves tennis skill ratings. @Martin_Ingram has quite thing for sports modeling and his PyMC code might provide a good template for dealing with other sports.

1 Like

Always looking for more resources on sports modeling so thank you! Already diving in to his blog and I’m sure ill be annoying Martin with questions soon. Haha.

TLDR: there’s 3 “layers” of data ive compiled: biographical (fight style, training camp) , historical (wins/losses by dec, ko, submission), and then fight level data (strikes per minute, knockdowns, etc)

Ideally im trying to reduce everything to a log probability for each feature, then use softmax to produce an overall probability for each outcome.

I think I’ve modeled biographical data well so far. But now I’m trying to layer in a fighter’s record. So if 40% of a fighter’s fights end in dec, 35% by ko, and 25% by submission, how likely his next one to end in dec/ko/submission.

The problem: I’m having trouble conceptualizing how to transform the fighters record into a log probability. I tried modeling some kind of coefficient with shape(3) to apply to the outcome_a|b_record but that has produced the below error.

I dont know if that makes sense but any conceptual help would be greatly appreciated.

ValueError                                Traceback (most recent call last)
File /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/pytensor/compile/function/types.py:970, in Function.__call__(self, *args, **kwargs)
    968 try:
    969     outputs = (
--> 970         self.vm()
    971         if output_subset is None
    972         else self.vm(output_subset=output_subset)
    973     )
    974 except Exception:

ValueError: Input dimension mismatch. One other input has shape[1] = 3, but input[1].shape[1] = 5916.

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[35], line 76
     73 # Likelihood for the first model
     74 obs1 = pm.Categorical("obs1", p=combined_prob, observed=fight_outcome_idx)
---> 76 trace = pm.sample()

File ~/pymc3/pymc/sampling/mcmc.py:593, in sample(draws, tune, chains, cores, random_seed, progressbar, step, nuts_sampler, initvals, init, jitter_max_retries, n_init, trace, discard_tuned_samples, compute_convergence_checks, keep_warning_stat, return_inferencedata, idata_kwargs, nuts_sampler_kwargs, callback, mp_ctx, model, **kwargs)
    591         [kwargs.setdefault(k, v) for k, v in nuts_kwargs.items()]
    592     _log.info("Auto-assigning NUTS sampler...")
--> 593     initial_points, step = init_nuts(
Inputs values: ['not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown', 'not shown']
Outputs clients: [[Softmax{axis=-1}(Elemwise{Composite}[(0, 0)].0)]]

HINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag 'optimizer=fast_compile'. If that does not work, PyTensor optimizations can be disabled with 'optimizer=None'.
HINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.

Hey Andrew and Christian,

Thanks for the ping, and also for the friendly email :slight_smile: I’m not that active here these days, I’d like to be, but I’m not finding much time at the moment, unfortunately.

Regarding your questions: If you’re interested in multiple outcomes, maybe reading about a multinomial regression would be helpful. I had a look and it looks like @cluhmann previously (How to fit multivariate multinomial model) referred to a notebook: https://nbviewer.org/github/cluhmann/DBDA-python/blob/master/Notebooks/Chapter%2022.ipynb . That seems to link to the Kruschke book. So that’s where I would look. Try to code up a multinomial logistic regression, then see if you can link things back to your problem. I was looking but in tennis I’ve usually just done binary outcomes, not multivariate, so I unfortunately don’t have any code that’s immediately applicable.

Looking at the error you’re getting, it seems like a shape issue. I think your first input should indeed have shape[1]=3 – one column for each outcome – but fight_outcome should probably just be a vector of the length of your data points. Basically, if you have N data points and M outcomes, I think p should be of shape N x M, and observed should be N (please correct me if I’m wrong @cluhmann ).

Hope that at least gets you started, let me know!


This is excellent and thank you for the resources. Figured you were busy but was worth a shot :blush:

Im so pumped you mentioned the shape thing because that was my guess at first so glad I’m on the right path. I guess the biggest hurdle is more conceptual. It’s that my variables have different shapes but I feel like they should all be able to fit together but I can’t wrap my head around how.

I.e, there are some variables with a shape of that will broadcast to all 3 outcomes at once:

          [Fight Camp]
        /       |      \
      [Dec,    KO,      Sub]

And there are some variables that have with a different shape but should be able to still broadcast to the three outcomes?:

    [Fighter A Dec %,             Fighter A KO %,             Fighter A Sub %]
    /      |       \            /      |        \             /      |       \
 [[Dec,    KO,      Sub],    [Dec,    KO,      Sub],      [Dec,    KO,      Sub]]

Is the distribution of coefficient the real latent variable I need to model here? The relationship between the fighter’s record and the fight outcome is really defined by the coefficient right?

Thanks so much for the help so far.