Qualitatively different results depending on shape parameter

Hello,

Expanding on the example of a hierarchical model for rugby prediction, I ran into an issue.

After having sampled from the posterior of teams’ parameters I now want to estimate the probabilities of a new game.

One way would be to sample random indices from the trace, chose the appropriate team indices of the match I want to simulate and sample away, as in:

    n = samples * size`    
    results = np.zeros((n,n_teams))
    for i, idx in enumerate(np.random.randint(0, len(trace), samples)):
        with pm.Model() as pred_model:
           home = trace["home"][idx]
            intercept = trace["intercept"][idx]

            atts = trace["atts"][idx]
            defs = trace["defs"][idx]

            home_theta = np.exp(intercept + home + atts[home_team_idx] + defs[away_team_idx])
            away_theta = np.exp(intercept + atts[away_team_idx] + defs[home_team_idx])

            home_goals = pm.Poisson('home_goals', mu=home_theta)
            away_goals = pm.Poisson('away_goals', mu=away_theta)

        with pred_model:
            pred_trace = pm.sample(size, progressbar=True)

        results[i*size:(i+1)*size] = pred_trace["home_goals"] - pred_trace["away_goals"]

    prob_home = (results > 0).sum(axis=0)/n
    prob_draw = (results == 0).sum(axis=0)/n
    prob_away = (results < 0).sum(axis=0)/n

However, I want to use a sufficiently large number of samples from the posterior. For each sample a model has to be compiled, which is relatively slow.

I thought I’d be smart and only build model and put all the sampled parameters from the posterior in the shape of the Poisson variables, like:

random_idx = np.random.randint(0, len(trace), samples)
with pm.Model() as pred_model:
    home = trace["home"][random_idx]
    intercept = trace["intercept"][random_idx]

    atts = trace["atts"][random_idx]
    defs = trace["defs"][random_idx]

    home_theta = np.exp(intercept+ 
                    home + 
                    atts[:, home_team_idx] + defs[:, away_team_idx])

    away_theta = np.exp(intercept + 
                    atts[:, away_team_idx] + defs[:, home_team_idx])
    
    home_goals = pm.Poisson('home_goals', mu=home_theta, shape=(samples))
    away_goals = pm.Poisson('away_goals', mu=away_theta, shape=(samples))

with pred_model:
    pred_trace = pm.sample(size,progressbar=True)

And then get the estimated probabilities through

sampled_home_goals = pred_trace["home_goals"][:,:].flatten()
sampled_away_goals = pred_trace["away_goals"][:,:].flatten()

results = sampled_home_goals - sampled_away_goals

prob_home = (results > 0).sum()/(samples * size)
prob_draw = (results == 0).sum()/(samples * size)
prob_away = (results < 0).sum()/(samples * size)

The last step isn’t particularly elegant, but this should do the job. To model compiles and samples, but the results are way off from what the slower iterative model generates. For a smaller number of sampled indices from the trace, like 5, the results are also sensible, but not far beyond that.

I’m not sure whether this is a bug, or there is a check for the shape in the Metropolis sampler, which results in a reduced precision, but somesthing doesn’t seem quite right. Additionally I was wondering why the iterative recompiling of my model is so slow? It appears to not increase linearly in the number of iterations.

Thank you very much!

Why not use ppc = pm.sample_ppc and index to the game with the team combination (e.g., team A vs team B) you would like?

You can also use numpy to generate random prediction to avoid compiling a new model:

n = samples * size
results = np.zeros((n,n_teams))
for i, idx in enumerate(np.random.randint(0, len(trace), samples)):
    home = trace["home"][idx]
    intercept = trace["intercept"][idx]

    atts = trace["atts"][idx]
    defs = trace["defs"][idx]

    home_theta = np.exp(intercept + home + atts[home_team_idx] + defs[away_team_idx])
    away_theta = np.exp(intercept + atts[away_team_idx] + defs[home_team_idx])

    home_goals = np.random.poisson(lam=home_theta, size=size)
    away_goals = np.random.poisson(lam=away_theta, size=size)

    results[i*size:(i+1)*size] = home_goals - away_goals

prob_home = (results > 0).sum(axis=0)/n
prob_draw = (results == 0).sum(axis=0)/n
prob_away = (results < 0).sum(axis=0)/n