Applying timeseries over periodic data

I am currently creating a model very similar to what @sbussmann is doing in terms of a rating/skill model.

Let’s say a player’s skill is represented by a y= N(0,1) prior, I am now at the point where I want to add in a time-dependent variable where it is basically an AR(1) process: y_t = N(rho*y_t-1, sigma). Where t is a period in time. During this period, multiple matches would have been played as observed data. for example:

t Team 1 Team 2 Winner
0 1 2 1
0 4 3 3
1 2 3 2

Thus the data is binned into periods and thats the source of my confusion mainly.

I am basically replicating this part of the model right here or more specifically this part:
image

Here’s my version in what I believe to be extremely inefficient but still samples (unscalable), I changed some code to simplify the example

n_periods = 10
n_teams = 30
obs_team_1 = N length vector of team 1 ids
obs_team_2 = N length vector of team 2 ids
obs_period = N length vector of time periods sorted e.g [0,0,0,0,1,1,1,2,2,3,3,3,3,3,3,4,4,4]
with pm.Model() as rating_model:
    
    rho = pm.Normal('rho', 0.8, 0.2)
    time_rating = [pm.Normal('rating_0', 0, 1, shape=n_teams)]
    for i in np.arange(1, n_periods):
        time_rating.append(pm.Normal('rating_'+str(i), rho*time_rating[i-1], 0.25, shape=n_teams))
    
    diff = [time_rating[i][obs_team_1[obs_period == i]] - time_rating[i][obs_team_2[obs_period == i]] for i in np.arange(n_periods)] # probably second biggest
    diff = tt.concatenate(diff) # Biggest Bottleneck from profiling
    p = pm.math.sigmoid(diff)
    wl = pm.Bernoulli('observed wl', p=p, observed=obs_winner)

How would i improve the sampling efficiency of this model as well as utilize the AR1 or GaussianRW classes that are available?

Do you have the full code and data somewhere?

Hi, see this:

Thanks. You are right it would be much easier to model it as an AR(1) process. However, the pm.AR1 is not really flexible/optimized, as it cannot take vector as mu. Something along this line might work:

with pm.Model() as rating_model:
    rho = pm.Beta('rho', 5, 8)
    omega = pm.HalfCauchy('omega', 0.5, shape=n_teams)
    sd = pm.HalfCauchy('sigma', 0.5)
    
    def AR1_vec(value):
        """vector version of AR1
        """
        x_im1 = value[:, :-1]
        x_i = value[:, 1:]
        boundary = pm.Normal.dist(0, sd=omega).logp # logp for rating_0
        innov_like = pm.Normal.dist(rho * x_im1, sd=sd).logp(x_i)
        return boundary(value[:, 0]) + tt.sum(innov_like)
    
    time_rating = pm.DensityDist('time_rating', 
                                 AR1_vec, 
                                 shape=(n_teams, n_periods))
    
    diff = time_rating[obs_team_1, obs_period] - time_rating[obs_team_2, obs_period]

    p = 0.5*pm.math.tanh(diff) + 0.5
    wl = pm.Bernoulli('observed wl', p=p, observed=obs_wl)

Thanks for the quick reply! I learned something new today about densitydist/timeseries when its a latent variable rather than observed :smiley:

The code doesn’t seem to sample on my end, I only get diverging samples:

Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
 73%|█████████████████████████████████████████████████████████▏                    | 1100/1500 [04:21<01:35,  4.21it/s]C:\Users\kevin.pei\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\step_methods\hmc\nuts.py:437: UserWarning: Chain 0 contains only diverging samples. The model is probably misspecified.
  % self._chain_id)
C:\Users\kevin.pei\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\step_methods\hmc\nuts.py:451: UserWarning: The acceptance probability in chain 0 does not match the target. It is 0.0, but should be close to 0.9. Try to increase the number of tuning steps.
  % (self._chain_id, mean_accept, target_accept))
C:\Users\kevin.pei\AppData\Local\Continuum\anaconda3\lib\site-packages\pymc3\step_methods\hmc\nuts.py:467: UserWarning: Chain 0 contains 613 diverging samples after tuning. If increasing `target_accept` does not help try to reparameterize.
  % (self._chain_id, n_diverging))

I also tried reparametrizing using this:

with pm.Model() as rating_model:
    
    rho = pm.Beta('rho', 5, 8)
    omega = pm.HalfCauchy('omega', 0.5, shape=n_teams)
    sigma = pm.HalfCauchy('sigma', 0.5)
    
    def AR1_vec(value):
        x_i = value[:, 1:]
        boundary = pm.Normal.dist(0, sd=omega).logp # logp for rating_0
        decay = tt.pow(rho, np.arange(1,n_periods))
        mu = tt.dot(value[:,0,np.newaxis], decay[:,np.newaxis].T)
        innov_like = pm.Normal.dist(mu, sd=sigma).logp(x_i)
        return boundary(value[:, 0]) + tt.sum(innov_like)
    
    time_rating = pm.DensityDist('time_rating', AR1_vec,  shape=(n_teams, n_periods))
        
    diff = time_rating[obs_team_1, obs_period] - time_rating[obs_team_2, obs_period]
    p = 0.5*pm.math.tanh(diff)+0.5
    wl = pm.Bernoulli('observed wl', p=p, observed=(obs['Team 1 ID'] == obs['winner']).values)

But still the same results

Yes you are right, there is some problem somewhere… it is a complex model - will need some care to make it work.

My suggestion would be starting from a simple model (if you have not yet done so), ie with all the rating just as a random normal: time_rating = pm.Normal('rating_all', 0., 1., shape=(n_periods, n_teams)) and then includes team intercept, team variance, team drift rate, etc. At each step, validate the model with generated data to make sure it is identifiable (i.e., you can recover the true parameters).

Great, I will try those steps one at a time. I know the current model works under the inefficient method so there must be something going on there with the AR1_vec class. I feel like it’s something to do with using the logp or defining the shape parameter (where each (team, period) tuple is an AR rather than part of the time series…if that makes any sense :thinking:)

Yeah that would be a better model, as the drift rate for each team is probably different.
You can also try theano.scan instead of the for-loop, there is an example here: https://github.com/pymc-devs/pymc3/blob/master/pymc3/examples/arma_example.py
But it wont be as fast as reparameterizing it into a matrix operation…

Ed

Thanks for your previous advice! I might have narrowed down the issue. Doing what you said, I stripped out all the accessory distributions and priors and started adding them back one by one. Here’s something I noticed, adding in a stochastic term for standard deviation makes the variational inference/initialization go crazy. Here’s the code:

    rho = pm.Uniform('rho', 0, 1)
    def AR1_vec(value):
        """vector version of AR1
        """
        x_im1 = value[:,:-1]
        x_i = value[:,1:]
        boundary = pm.Normal.dist(0, 1).logp 
        innov_like = pm.Normal.dist(rho * x_im1, sd=1).logp(x_i)
        return boundary(value[:,0]) + tt.sum(innov_like)
    
    time_rating = pm.DensityDist('time_rating', AR1_vec, shape=(n_teams, n_periods))

Adding in the tau term makes advi go negative (and it keeps going) while tau posterior goes toward 0. If we were to set sd=1 for example, it would converge on a reasonable positive loss number and the estimated (Advi inferred) posterior is close to actual NUTS sampling (on the original model i posted)

Edit: It seems like the code you have posted (now ommitting any stochastic sd) is really slow at sampling and often gets stuck on certain samples whereas my code can still consistently sample at 1.57it/s. Perhaps the logp class is doing something here we dont know about.