What to do with low acceptance probabilities?

I am currently developing a hierarchical ratings model very similar to the one here and finding it difficult to efficiently sample from the posterior of the ratings.

I am using the NUTS sampler and have tried:

1.reparameterization (the non-centered version)

  1. turning up the target_accept parameter.
  2. HalfNormal Prior for omega
  3. New variables with priors.

Example Data:
image

Data Prep Code:

teams = np.sort(np.unique(np.concatenate([h_matches['Team 1 ID'], h_matches['Team 2 ID']])))
maps = obs.Map.unique()
tmap = {v:k for k,v in dict(enumerate(teams)).items()}
mmap = {v:k for k,v in dict(enumerate(maps)).items()}
n_teams = len(teams)
n_maps = len(maps)
print('Number of Teams %i: ' % n_teams)
print('Number of Matches %i: ' % len(obs))
print('Number of Maps %i: '% n_maps)

Here is my original model:

with pm.Model() as rating_model:
    
    omega = pm.InverseGamma('omega', 4,2)
    rating = pm.Normal('rating', 0, omega, shape=n_teams)
    #rating_map = pm.Normal('rating | map', rating, 0.5, shape=(n_maps, n_teams))
    
    diff = rating_map[obs['Map'].map(mmap).values,obs['Team 1 ID'].map(tmap).values] - rating_map[obs['Map'].map(mmap).values,obs['Team 2 ID'].map(tmap).values]
    p = tt.nnet.sigmoid(diff)
    
    err = pm.Bernoulli('observed', p=p, observed=(obs['Team 1 ID'] == obs['winner']).values)


with rating_model:
    trace = pm.sample(10000, init='advi', nuts_kwargs={'target_accept': 0.99}, tune=0)

and non-centered one

with pm.Model() as rating_model:
    
    omega = pm.InverseGamma('omega', 4,2)
    rating = pm.Normal('rating', 0, omega, shape=n_teams)
    theta_tilde = pm.Normal('rate_t', mu=0, sd=1, shape=(n_maps, n_teams))
    rating_map = pm.Deterministic('rating | map', rating + 0.5 * theta_tilde)
    
    diff = rating_map[obs['Map'].map(mmap).values,obs['Team 1 ID'].map(tmap).values] - rating_map[obs['Map'].map(mmap).values,obs['Team 2 ID'].map(tmap).values]
    p = tt.nnet.sigmoid(diff)
    
    err = pm.Bernoulli('observed', p=p, observed=(obs['Team 1 ID'] == obs['winner']).values)


with rating_model:
    trace = pm.sample(10000, init='advi', nuts_kwargs={'target_accept': 0.99}, tune=0)

I get the Warning UserWarning: The acceptance probability in chain 0 does not match the target. It is 0.00052671795651, but should be close to 0.99. Try to increase the number of tuning steps. which indicates that my acceptrance probablility is super low and that I should reparameterize or increase target_accept.

Anything I can do to improve on this model? It currently samples fine for a low number of points but anything above a thousand runs into issues.

Why did you set the number of tuning steps to 0? Because of this it doesn’t change the initial step size, and if that isn’t right you will get a bad acceptance rate. What happens if you just use the defaults for all sampling parameters? ie pm.sample() or maybe pm.sample(1000, njobs=4, tune=1000)? This should also change the init method to something probably more appropriate (jitter+adapt_diag), at least on version 3.2.

Well I feel stupid now, I always thought the tune parameter just indicated whether to have a burn-in or not. The model now samples well but at a slow iteration/sec. According to @junpenglao, this could mean that the advi initialization could be poor since it speeds up after 500 samples. Using jitter+adapt could potentially solve this for me since ADVI loss hits a minimum and then starts increasing (doesn’t stop)

Don’t feel bad, reading the docstring I understand how you got that idea. Would this have prevented the confusion?

Yes I think that makes it much clearer to the reader. I have also read code before that specifically removed the tuning procedure. Thanks.