Unexpected results of hierarchical model for prediction of election results



I am currently trying to implement the following model:

I have a number of political experts, who in the past have made estimation about the outcome of elections.
I assume each expert has an accuracy/edge which makes his prediction better than a random guess.
I have built a model, which allows me to both estimate an experts accuracy based on past events, and make inference about the outcome of a future event of which only the experts’ predictions are known to the model.

While the model can reconstruct the experts’ accuracy rather well, the implied probabilities for future events are not useful and the posterior looks way off what I would expect given the model.

This is a working minimum example:

# Imports
    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    import pymc3 as pm
    import seaborn as sns

# Experts and Events
    n_train_events = 10
    n_test_events = 2
    n_events = n_train_events + n_test_events
    event_outcomes = np.random.binomial(1, 0.5, n_events)

    n_experts = 10
    n_observations = n_events * n_experts
    # use a Beta (2,2)
    expert_accuracy = np.random.beta(2, 2, n_experts)

# Dummy Dataset
    expert_event_df = pd.DataFrame(columns=['expert_id', 'event_id', 'expert_pred', 'true_outcome'],
    i = 0
    for event_idx in range(n_events):    
        event_outcome = event_outcomes[event_idx]
        correct_pred = np.random.binomial(1, expert_accuracy)
        random_guess = np.random.binomial(1, 0.5 * np.ones(n_experts))
        event_pred = correct_pred * event_outcome  + \
                        (1 - correct_pred) * random_guess
        idx_slice = expert_event_df.index.isin(expert_event_df.index[i:i+n_experts])
        expert_event_df.loc[idx_slice, 'expert_id'] = np.arange(n_experts)
        expert_event_df.loc[idx_slice, 'event_id'] = event_idx
        expert_event_df.loc[idx_slice, 'expert_pred'] = event_pred
        expert_event_df.loc[idx_slice, 'true_outcome'] = event_outcome
        i+= n_experts

# PyMC3 Model
    expert_lookup = expert_event_df['expert_id'].astype(int).values
    event_lookup = expert_event_df['event_id'].astype(int).values
    observed_pred = expert_event_df['expert_pred'].values
    # Test Event Outcomes are masked so that they are sampled
    obs_outcomes = event_outcomes.copy()
    obs_outcomes[-n_test_events:] = -1
    masked_data = np.ma.masked_array(obs_outcomes, obs_outcomes == -1, fill_value=-1)

    with pm.Model() as model:
        pm_expert_accuracy = pm.Beta('pm_expert_accuracy',
                                     alpha=1.0, beta=1.0, 
        pm_event_prob = pm.Beta('pm_event_prob', 
                                alpha=1.0, beta=1.0,
        pm_event_outcome = pm.Bernoulli('pm_event_outcome', 
        broadcast_expert_accuracy = pm_expert_accuracy[expert_lookup]
        broadcast_true_outcome = pm_event_outcome[event_lookup]
        pm_expert_pred = pm.Deterministic('pm_expert_pred',
                                          broadcast_expert_accuracy * (broadcast_true_outcome - 0.5) + 0.5)
        expert_predictions = pm.Bernoulli('expert_predictions', 

# Sampling
    with model:
       trace = pm.sample(10000)
       burned_trace = trace[1000:]

The problem is now with the posterior samples for pm_event_outcome_missing, so for the events whose outcomes I have masked. Even if I modify the dataset in a way that the experts’ estimates for the two events are completely random, the posterior distribution is always almost singular around 0 or 1.

Is there any obvious thing I am missing? Any advice on how to achieve a model which is better at reflecting uncertainty?

Thank you very much and kind regards.


For the prediction to work, you need to formula the election result as some kind of function of the prediction from the experts. It might sound a bit weird, but think of it as forecasting. Otherwise, what you are doing now is basically sampling from the prior Beta(1., 1.) Because of the inefficiency of Metropolis, the sampler will stuck quite a lot which gives you lots of either 0 or 1.


Actually, the way you are modelling should be fine - the problem might be just the inefficiency of Metropolis then. You can generate posterior prediction like this maybe:
post_predi = np.random.binomial(p=trace['pm_event_prob'][:,10:], n=1)


Hey, thank you for taking your time to have a look at this. In what way does the Metropolis sampler behave inefficient in this case?
Even the posterior sample for ‘pm_event_prob’ is somewhat contradictory to what I expect. For once, despite a high number events and experts, the shape is very much dependent on the prior I pass for it. If I give a uniform prior, the histogram is always triangular, while for a different beta prior it is smoother. Additionally, independent of the event and the consensus per event, the posterior of ‘pm_event_prob’ is always taking on one of two “shapes”, with most probability mass at 0 or 1, but there are no gradual differences which would indicate higher or lower uncertainty.
Is my model misspecified or am I missing something about the internal mechanics of pymc3?


Looking at the trace the value rarely move, which means most of the proposal value is rejected. This is usually an indication of inefficiency.

As for the shape of the posterior, I would reason that this is the way the model is. As from a uniform or Beta(1, 1) prior with 1 observation (indeed, each of the event only observed once), the posterior is necessary triangular.


Thank you for looking into this and sorry for the late reply, was on vacation.

What you say in regard to the shape of the posterior makes sense, didn’t consider this.

In regard to the sampler inefficiency: I do understand the issue with this, but is there any way to overcome this? I reckon given that, the posterior traces for the unobserved election events are not meaningful/ this model can not be used for this purpose?


I would try a different way to model it just to compare, by using a logistic regression. In this way, even the election is not yet happen, you can use the linear regression to do prediction.
This implementation is pretty standard, so it should be a nice baseline as well.