In light of the upcoming Israeli elections (now almost a month away), I created a model to analyze the polls.The code is now available at https://github.com/byblian/pyhoshen and you can see example results of the ‘polls-only’ model at https://twitter.com/pyHoshen
There is also a correlation distributions graph I made, that may be of interest in general:
My main problem at this time is that the code takes ~3+ hours to run on Google Colab. I am not sure if that is due to the difficulty of the model given the data (16 parties, ~10 pollsters, ~30 polls over the last 30 days), to problems in the model (this is my first pymc3 model), or to the performance of Google Colab (I don’t know how it will perform on other platforms). I am assuming all of the above.
My background is in software, not statistics. I have spent quite a lot of time exploring various parameterizations but nothing really ran fast at any time. At best, it got to 1-5 iterations/s. My main concern has been so far focused on ensuring it converges appropriately (within Google’s 12-hour timeframe).
So I would appreciate help and advice regarding the polls-only model, which converges but slowly (3 hours)
The ‘mult-mean-variance’ model which attempts to model house effects as a per-pollster per-party coefficient multiplied on the mean with an additional per-pollster variance has now taken 10 hours to run, almost converged, and is still problematic. I don’t know yet how well the ‘variance’ model will perform.
The basic definitions of the model is in models.py, which I tried to document extensively.
The class breakdown might be a little complicated but overall, it is really simple conceptually: a random walk going back from the forecast day, with polls modeled as a MvStudentT and the innovations modeled as MvNormal, all with the same basic cholesky matrix (16x16).
I do not have a public notebook example yet, but an example usage is as follows:
import datetime
import pymc3 as pm
from pyhoshen import israel
forecast_day = datetime.datetime.fromordinal(datetime.date.today().toordinal())
with pm.Model() as model:
election = israel.IsraeliElectionForecastModel(
'https://drive.google.com/uc?id=1WYGgC3LeTkwKz0Oc2IYX5OdnyiroSA9P',
model_type='polls-only', base_elections=[],
forecast_day=forecast_day, eta=25)
The results can then be plotted as follows:
import theano
theano.config.compute_test_value = 'off'
bo=election.compute_trace_bader_ofer(samples['support'], threshold=0.0325)
election.plot_mandates(samples, bo, hebrew=False)
election.plot_party_support_evolution_graphs(samples, bo, hebrew=False)
- json document with configuration (this is the file provided in the above code):
https://drive.google.com/open?id=1WYGgC3LeTkwKz0Oc2IYX5OdnyiroSA9P - google spreadsheet with polls data -
https://docs.google.com/spreadsheets/d/1lqqrIp_sXir_Sz_H_y3mCvOaXGc_lmSsPaD-Y0C9eBk