Add additional information to the Rugby example

MasLaMola · June 27, 2022, 12:27pm

I was wondering how one might go about improving the model if additional data were available in the Rugby dataset.

https://docs.pymc.io/en/v3/pymc-examples/examples/case_studies/rugby_analytics.html

For example, if the original dataset had an extra column containing the difference in team quality. How could this be added to the model as a global parameters?

I have tried the following updates:

with pm.Model(coords=coords) as model:
    # constant data
    home_team = pm.Data("home_team", home_idx, dims="match")
    away_team = pm.Data("away_team", away_idx, dims="match")

    # global model parameters
    home = pm.Normal("home", mu=0, sigma=1)
    sd_att = pm.HalfNormal("sd_att", sigma=2)
    sd_def = pm.HalfNormal("sd_def", sigma=2)
    intercept = pm.Normal("intercept", mu=3, sigma=1)
    team_quality = pm.Normal("team_quality", mu=0, sigma=2)

    # team-specific model parameters
    atts_star = pm.Normal("atts_star", mu=0, sigma=sd_att, dims="team")
    defs_star = pm.Normal("defs_star", mu=0, sigma=sd_def, dims="team")

    atts = pm.Deterministic("atts", atts_star - tt.mean(atts_star), dims="team")
    defs = pm.Deterministic("defs", defs_star - tt.mean(defs_star), dims="team")

    team_qual = pm.Deterministic("quals", team_quality * all_df["team_quality"].values, dims="team")

    home_theta = tt.exp(intercept + home + atts[home_idx] + defs[away_idx] + team_qual[home_idx])
    away_theta = tt.exp(intercept + atts[away_idx] + defs[home_idx] + team_qual[away_idx])

    # likelihood of observed data
    home_points = pm.Poisson(
        "home_points",
        mu=home_theta,
        observed=df_all["home_score"],
        dims=("match"),
    )
    away_points = pm.Poisson(
        "away_points",
        mu=away_theta,
        observed=df_all["away_score"],
        dims=("match"),
    )

This code begins sampling before returning “RuntimeError: Chain 3 failed.”.

My question is how does one correctly incorporate an additional variable directly from the data?

DanWeitzenfeld · July 1, 2022, 8:53pm

I think you are best off using external “team quality” data in defining atts_star and defs_star.

E.g., in the paper on which the Rugby model is based, they set different priors for atts/defs based on whether the team is bottom, middle, or top team.

Alternatively, you could standardize your team quality scores, and set att[i] = quality[i] + noise. So the model would use your team quality scores as a noisy signal of true attacking/defending ability.

Topic		Replies	Views
Help with Hierarchical Model Example Questions	3	525	October 5, 2021
Understanding how dims='team' is connected to dataframe in Rugby example Questions modeling	2	371	July 27, 2023
Rewriting Likelihood with Potential Causes the Gradient to Crash Questions	2	370	March 29, 2021
Rugby Example Code Mistakes? Questions	6	797	January 30, 2018
How do I include constant_data in my model? v5 bug , arviz	10	1055	May 18, 2022

Add additional information to the Rugby example

Related topics