Feeding Posteriors into Priors in a Multilevel Model

I am currently trying to make a sports analytics model that uses the results from the regular season as the prior for some of the parameters for the playoffs. I am using the model by Baio and Blangiardo following one of the PyMC3 examples.

My attempt at this follows the KDE linear interpolation example; however, the model I am using is multi-level and I am unsure as to how to perform the KDE linear interpolation with a random-variable that has multiple indexes.

Here is a snippet for more context:

with pm.Model() as regular_season_model:
    # global model parameters
    home = pm.Flat('home')
    sd_att = pm.Exponential('sd_att', lam=10)
    sd_def = pm.Exponential('sd_def', lam=10)
    intercept = pm.Flat('intercept')

    # team-specific model parameters
    atts_star = pm.Normal("atts_star", mu=0, sigma=sd_att, shape=num_teams)
    defs_star = pm.Normal("defs_star", mu=0, sigma=sd_def, shape=num_teams)

    atts = pm.Deterministic('atts', atts_star - tt.mean(atts_star))
    defs = pm.Deterministic('defs', defs_star - tt.mean(defs_star))
    home_theta = tt.exp(intercept + home + atts[df_s.home_team_id] + defs[df_s.away_team_id])
    away_theta = tt.exp(intercept + atts[df_s.away_team_id] + defs[df_s.home_team_id])

    # likelihood of observed data
    home_points = pm.Poisson('home_points', mu=home_theta, observed=df_s.home_goals)
    away_points = pm.Poisson('away_points', mu=away_theta, observed=df_s.away_goals)
    
    regular_season_trace = pm.sample(2000, tune=2000, cores=3)

def from_posterior(param, samples, k=100):
    smin, smax = np.min(samples), np.max(samples)
    width = smax - smin
    x = np.linspace(smin, smax, k)
    y = stats.gaussian_kde(samples)(x)
    
    # what was never sampled should have a small probability but not 0,
    # so we'll extend the domain and use linear approximation of density on it
    x = np.concatenate([[x[0] - 3 * width], x, [x[-1] + 3 * width]])
    y = np.concatenate([[0], y, [0]])
    return pm.Interpolated(param, x, y)

with pm.Model() as playoff_model:
    # global model parameters
    home = pm.Flat('home')

    sd_att = from_posterior('sd_att', regular_season_trace['sd_att'])
    sd_def = from_posterior('sd_def', regular_season_trace['sd_def'])
    intercept = from_posterior('intercept', regular_season_trace['intercept'])

    # team-specific model parameters
    # not sure how to code these priors???
    atts_star = from_posterior('atts_star', regular_season_trace['atts_star'])
    defs_star = from_posterior('defs_star', regular_season_trace['defs_star'])
    # old way
    #atts_star = pm.Normal("atts_star", mu=0, sigma=sd_att, shape=num_teams)
    #defs_star = pm.Normal("defs_star", mu=0, sigma=sd_def, shape=num_teams)

    atts = pm.Deterministic('atts', atts_star - tt.mean(atts_star))
    defs = pm.Deterministic('defs', defs_star - tt.mean(defs_star))
    home_theta = tt.exp(intercept + home + atts[df_p.home_team_id] + defs[df_p.away_team_id])
    away_theta = tt.exp(intercept + atts[df_p.away_team_id] + defs[df_p.home_team_id])

    # likelihood of observed data
    home_points = pm.Poisson('home_points', mu=home_theta, observed=df_p.home_goals)
    away_points = pm.Poisson('away_points', mu=away_theta, observed=df_p.away_goals)

What I want is to use “informed priors” for the team strengths from the regular season, since the playoff sample is rather small to refit team strengths (e.g. a team that loses in the first round of playoffs gives a very small sample, and teams that go further give a larger but still pretty small sample), but some variables I still want to fit anew (e.g. home, one of the things I want to see is if there is a change in home advantage, and the total number of games is a relatively decent sized sample for an overall param such as this).

My initial thoughts are to create separate variables for each atts_star and defs_star for each team and to individually call from_posterior() on each one, but I am unsure if this brute-force would even work (would it lose the ‘nested’/‘grouped’ quality of the multi-level partial pooling?) and I am wondering if there is a better way that keeps them as one variable with an index for each team. This also makes the from_posterior() function confusing, because I need a different KDE for each index of the random variable, but I don’t think there is a vectorized KDE. Instead the multi-dimensional KDE would fit a multi-variate distribution which would not be the desired behaviour here. Needless to say I am a bit lost and confused.

Any help/advice/insight would be greatly appreciated. Thanks.

If I understand correctly you have regular season and playoffs all in the same data set, I that case I would rather modeling all the data u one model but have separate parameters for regular season and playoffs that shared the same hyper parameters. I would not use kde because they does not capture correlation.

1 Like