I’m trying to adapt this Bayesian ideal point model, which estimates ideological positions of Twitter handles on a latent political partisan axis based on the overlap between followers, to Wikipedia clickstream data, a monthly dataset published by the Wikimedia Foundation consisting of the exact number of clicks between pairs of linked articles.
I am modeling the clickstream flow between pairs of articles as Poisson distributed, with a rate parameter depending on the distance between the pairs of articles (indicated here by locations theta
and phi
on the latent axis. occurrences
is a symmetric matrix of average monthly flow (regardless of direction) between a pair of articles.
I am following some of the guidelines in the original paper, such as seeding the model with ‘positive’ and ‘negative’ examples on the latent axis to induce identifiability, but the different chains of the model often return wildly differing estimates with unacceptable R-hat values.
The model is as follows:
page_indexer = np.array([(i, j) for i in range(self.occurrences.shape[1]) for j in range(self.occurrences.shape[0])])
page_is, page_js, flow = [np.array(x) for x in zip(*[(ix[0],ix[1], val) for ix, val in
zip(page_indexer, np.ravel(self.occurrences)) if val])]
BoundedNormal = pm.Bound(pm.Normal, lower=-4., upper=4.)
sd_a = pm.Normal('mu_a', mu=1., sd=3)
mu_b = pm.Normal('mu_b', mu=1., sd=1)
theta = BoundedNormal('theta', mu=0., sd=.1,shape=self.occurrences.T.shape[1])
phi = BoundedNormal('phi', mu=0., sd=.1 ,shape=self.occurrences.T.shape[0])
a = pm.Normal('a', mu=0, sd=sd_a, shape=self.occurrences.T.shape[0])
b = pm.Normal('b', mu=mu_b, sd=.1, shape=self.occurrences.T.shape[1])
lambda_hat = (a[page_is] + b[page_js] - pm.math.sqr(theta[page_js] -
T.switch(page_is==neg_pole, -4., T.switch(page_is==pos_pole, 4., phi[page_is]))))
lambda_like = pm.Poisson('lambda_like', mu = np.exp(lambda_hat), observed = flow)```