Porting stan model to PyMC

I am interested in creating a hierarchical model for tennis matches. I am trying to emulate the model below written in stan. How would i go about this in PyMC? I am guessing I should use a GaussianRandomWalk but unsure how to implement it.

data {
    int<lower=0> num_matches;
    int<lower=0> num_players;
    int<lower=0> num_periods;
    
    int s_id[num_matches]; // server ids
    int r_id[num_matches]; // returner ids
    int period[num_matches]; // the current period
    
    int spw[num_matches]; // serve points won
    int spt[num_matches]; // serve points total
}
parameters {
    matrix[num_periods, num_players] s; // serving ability
    matrix[num_periods, num_players] r; // returning ability
    
    real<lower=0> sigma_s; // variance for hyperprior on initial s
    real<lower=0> sigma_r; // variance for hyperprior on initial r
    
    real<lower=0> sigma_t_s; // the time dependence on serve
    real<lower=0> sigma_t_r; // the time dependence on return   
    
    real theta_0; // intercept for serve probability
}
model {
    vector[num_matches] cur_logit;

    // priors
    sigma_s ~ normal(0, 1);
    sigma_r ~ normal(0, 1);
    sigma_t_s ~ normal(0, 1);
    sigma_t_r ~ normal(0, 1);
    theta_0 ~ normal(0, 1);
    
    // Initial prior
    s[1, :] ~ normal(0, sigma_s);
    r[1, :] ~ normal(0, sigma_r);
    
    // Time development
    for (i in 2:num_periods){
        s[i, :] ~ normal(s[i-1, :], sigma_t_s);
        r[i, :] ~ normal(r[i-1, :], sigma_t_r);
    }
    
    for (i in 1:num_matches){
        cur_logit[i] = s[period[i], s_id[i]] - r[period[i], r_id[i]] + theta_0;
    }
    
    // Likelihood
    spw ~ binomial_logit(spt, cur_logit);
}

Hi Blake,

I’m pretty sure I wrote this model! Where exactly are you stuck, is it the random walk part? I think this should get you started:

import pymc as pm

# Change these to your values
n_players = 10
n_periods = 20

with pm.Model() as model:
    
    # Initial variance
    sigma_s = pm.HalfNormal('sigma_s', sigma=1.)
    
    # Random walk variance:
    sigma_t_s = pm.HalfNormal('sigma_t_s', sigma=1.)

    s = pm.GaussianRandomWalk('s', sigma=sigma_t_s, shape=(n_periods, n_players),
                              init_dist=pm.Normal.dist(0, sigma_s))
    
    # ... The other random walk should be similar.

I haven’t tested this a great deal, so there may be a mistake, but I think this should work. This will give you the matrix s, which should be the same as the Stan one. I was a little bit unsure about whether the random walk is correctly vectorised, but according to this: Can I instantiate multiple random walks with different drifts it seems that this should return n_players independent random walks, which is what you want. You then need to construct r also in the same way way. Let me know if that helps.

Best,
Martin

3 Likes

this is your model! great stuff btw i learned a lot from reading your paper on your tennis model!

i did something similar but didn’t have the init_dist part and just used the sigma_s. it is sampling extremely slow however but definitely faster with the init_dist. that might be expected though since the dataset has almost 10000 matches in it

i made a mistake with the shape, it ran a bit quicker after.

didnt end up with good sampling though. r hats were all high for the sigmas. thinking maybe a non centered version could help. not really sure how i can do this with the random walk though.

Hi Blake,

I’m happy to hear you found the paper useful :slight_smile:

Interesting that you’re getting slow / bad sampling. Would you be able to post your code to check whether everything looks OK? With 10,000 matches, it’s entirely possible that sampling will just be slow (especially if there are many periods – how many are you using?), but perhaps there are some more things we could tweak.

Best,
Martin

heres the model. there is 16 periods in the date variable. is a noncentered approach possible with a random walk?

coords = {"server": servers, "returner": returners, "date": date}

with pm.Model(coords=coords) as model:
  sigma_s = pm.HalfNormal('sigma_s', 1)
  sigma_r = pm.HalfNormal('sigma_r', 1)


  sigma_t_s = pm.HalfNormal('sigma_t_s', 1)
  sigma_t_r = pm.HalfNormal('sigma_t_r', 1)

  intercept = pm.Normal('intercept', 0, 1)

  s = pm.GaussianRandomWalk('s', sigma=sigma_t_s, dims=("date", "server"), init_dist=pm.Normal.dist(0, sigma_s))
  r = pm.GaussianRandomWalk('r', sigma=sigma_t_r, dims=("date", "returner"), init_dist=pm.Normal.dist(0, sigma_r))

  p = pm.Deterministic('p', pm.math.invlogit(s[date_idx, server_idx] - r[date_idx, returner_idx] + intercept))

  spw = pm.Binomial('spw', combined['sp_total'].values, p, observed=combined['sp_won'].values)

  trace = pm.sample()



Hi Blake,

that looks pretty good to me at first glance (I’ll try to take a closer look later). The only thing I’d suggest is to use the logit_p functionality in the Binomial rather than using the inverse logit. I’ve found that to be more numerically stable. I’d be surprised if it makes a big difference in this case since the probabilities are unlikely to be extreme, but it’s still worth a go I think. So I suggest replacing:

  p = pm.Deterministic('p', pm.math.invlogit(s[date_idx, server_idx] - r[date_idx, returner_idx] + intercept))

  spw = pm.Binomial('spw', combined['sp_total'].values, p, observed=combined['sp_won'].values)

with something like:

  logits = pm.Deterministic('logits', s[date_idx, server_idx] - r[date_idx, returner_idx] + intercept)

  spw = pm.Binomial('spw', combined['sp_total'].values, logit_p=logits, observed=combined['sp_won'].values)

r_hats were a bit lower but still not great convergence

A GRW is just the cumulative sum of a bunch of gaussians, so you could do something like:

mu = pm.Normal('mu', mu=0, sigma=1)
sigma = pm.HalfNormal('sigma', sigma=1)
offset = pm.Normal('offset', mu=0, sigma=1, size=T)
grw = (mu + sigma * offset).cumsum()

Make sure to specify the time axis in .cumsum() if you have more dimensions than my simple example.

how could i specify the initial distribution in this form?

also if i had 5 periods and 7 players for example, would i use shape=(5,7) for the offset

@jessegrabowski looks good, but isn’t that what GaussianRandomWalk does internally? (Sorry, it’s been a while since I looked but I thought that’s how it was implemented)

It could be, I’m not familiar enough with the class inheritance structure of random variables to understand what is going on at just a glance of the code. Looking at the code here, it seems like the logp and rng_fn functions use centered normals, but I could be looking at the totally wrong place, or not understanding what I am reading. Perhaps @ricardoV94 knows if there is any merit to doing something like I proposed?

(I also looked for a discussion in the stan community about centered vs non-centered GRW, but didn’t come up with anything, for what its worth)

With these caveats, you would add a initial distribution by defining it as you wish, then manually concatenating it to the beginning of mu + sigma * offset, then taking the cumsum of the new vector.

If you want to pool information from the player and period dimensions together, yes I think so.

1 Like

For non-observed GRW it may be better to define in the non-centered way, which is NOT what pm.GaussianRandomWalk does as you said. For observed, only centered works, but it wouldn’t matter as we are not sampling those.

I have thought about it and considered using different forms for observed and unobserved distributions.

Would be good to confirm NUTS indeed prefers the non-centered case. I would guess so as it can do proposals closer to the unit range.

This might also make sense for Censored distributions as NUTS can’t sample the direct logp, but it may be happy to sample a latent uncensored distribution that is then clipped (or perhaps not, but that’s what we suggest in the docs anyway)

2 Likes

We could also implement a DiffTransform to the same effect and hope/check if Aesara is clever enough to skip a diff of a cumsum

I am pretty new to all this so I’m definitely doing this completely wrong. This is what I tried.

with pm.Model(coords=coords) as model:
  sigma_s = pm.HalfNormal('sigma_s', 1)
  sigma_r = pm.HalfNormal('sigma_r', 1)

  sigma_t_s = pm.HalfNormal('sigma_t_s', 1)
  sigma_t_r = pm.HalfNormal('sigma_t_r', 1)

  intercept = pm.Normal('intercept', 0, 1)

  init_s = pm.Normal('init_s', 0, 1, shape=(1, len(servers)))
  init_r = pm.Normal('init_r', 0, 1, shape=(1, len(returners)))

  offset_s = pm.Normal('offset_s', 0, sigma_t_s, shape=(len(date)-1, len(servers)))
  offset_r = pm.Normal('offset_r', 0, sigma_t_r, shape=(len(date)-1, len(returners)))

  offset_concat_s = at.concatenate([init_s, offset_s*sigma_s])
  offset_concat_r = at.concatenate([init_r, offset_r*sigma_r])

  s = offset_concat_s.cumsum(axis=1)
  r = offset_concat_r.cumsum(axis=1)

  logits = pm.Deterministic('p', s[date_idx, server_idx] - r[date_idx, returner_idx] + intercept)

  spw = pm.Binomial('spw', combined['sp_total'].values, logit_p=logits, observed=combined['sp_won'].values)