I am currently creating a model very similar to what @sbussmann is doing in terms of a rating/skill model.
Let’s say a player’s skill is represented by a y= N(0,1) prior, I am now at the point where I want to add in a time-dependent variable where it is basically an AR(1) process: y_t = N(rho*y_t-1, sigma). Where
t is a period in time. During this period, multiple matches would have been played as observed data. for example:
|t||Team 1||Team 2||Winner|
Thus the data is binned into periods and thats the source of my confusion mainly.
I am basically replicating this part of the model right here or more specifically this part:
Here’s my version in what I believe to be extremely inefficient but still samples (unscalable), I changed some code to simplify the example
n_periods = 10 n_teams = 30 obs_team_1 = N length vector of team 1 ids obs_team_2 = N length vector of team 2 ids obs_period = N length vector of time periods sorted e.g [0,0,0,0,1,1,1,2,2,3,3,3,3,3,3,4,4,4] with pm.Model() as rating_model: rho = pm.Normal('rho', 0.8, 0.2) time_rating = [pm.Normal('rating_0', 0, 1, shape=n_teams)] for i in np.arange(1, n_periods): time_rating.append(pm.Normal('rating_'+str(i), rho*time_rating[i-1], 0.25, shape=n_teams)) diff = [time_rating[i][obs_team_1[obs_period == i]] - time_rating[i][obs_team_2[obs_period == i]] for i in np.arange(n_periods)] # probably second biggest diff = tt.concatenate(diff) # Biggest Bottleneck from profiling p = pm.math.sigmoid(diff) wl = pm.Bernoulli('observed wl', p=p, observed=obs_winner)
How would i improve the sampling efficiency of this model as well as utilize the AR1 or GaussianRW classes that are available?