I am trying to adapt the election poll modeling code http://srome.github.io/Dont-Solve-Simulate-Markov-Chain-Monte-Carlo-Methods-with-PyMC3/
to a multi-party election scenario.
The following snippet adapts it to selective polling by using masked_array, so that rather than an array of four pollers, my code now sees the polls array as polls per day, where each entry could be any poller (including possibly multiple polls from the same poller in one day).
polls = ma.masked_invalid(np.full([max_polls, num_days], np.nan))
num_polled = ma.masked_equal(np.full([max_polls, num_days], -1).astype(int), -1)
pollers = np.full([max_polls, num_days], num_pollers).astype(int)
# read poll data ...
with pm.Model() as poll_model:
prior = pm.Uniform('p', 0, 1.0, shape = 1)
sigma = pm.Exponential('sigma', 1./.2, testval=.1) # No more than 20% avg
s = pm.GaussianRandomWalk('s', sd=sigma**-2, shape=num_days)
# Margin of error prior
sigma2 = pm.Uniform('margin_of_error',lower = 0, upper = .5, shape=num_pollers + 1, testval=.05)
# Deterministic formula for mu
mu = pm.Deterministic('mu',prior + s)
# Likelihood function
poll = pm.Normal('Likelihood', mu=mu, sd=sigma2[pollers], observed=polls)
I am now trying to add covariance and multiple parties based on posts like https://austinrochford.com/posts/2015-09-16-mvn-pymc3-lkj.html. But I keep getting dimension issues:
with pm.Model() as poll_model:
prior = pm.Uniform('p', 0, 1.0, shape = [num_parties])
party_s_corr = pm.Lognormal('s_corr',
np.zeros(num_parties),
np.ones(num_parties), shape=num_parties)
party_corr_lkj = pm.LKJCorr('corr_lkj', n=1, p=num_parties)
party_corr_matrix = party_corr_lkj[party_tri_index]
party_corr_matrix = T.fill_diagonal(party_corr_matrix, 1)
party_s_diag = T.diag(party_s_corr)
party_cov_matrix = T.nlinalg.matrix_dot(party_s_diag, party_corr_matrix, party_s_diag)
s = pm.MvGaussianRandomWalk('s', cov=party_cov_matrix,
shape=[num_days, num_parties])
#
# Margin of error prior
sigma2 = pm.Uniform('margin_of_error',lower = 0, upper = .5,
shape=num_pollers + 1, testval=.05)
#
# Deterministic formula for mu
mu = pm.Deterministic('mu',prior + s)
# Likelihood function
poll = pm.Normal('Likelihood', mu=mu, sd=sigma2[pollers],
observed=polls)
I have played around with placing the input arrays in different dimension configurations, but I keep getting errors such as “Input dimension mis-match” or “operands could not be broadcast together with remapped shapes [original->remapped]: (92,11) and requested shape (92,11,5)” (there are 92 days, 11 parties, and 5 maximum polls a day in my data), both when using pm.Normal or pm.MvNormal in the end.