What to do with low acceptance probabilities?

kpei · October 20, 2017, 12:41am

I am currently developing a hierarchical ratings model very similar to the one here and finding it difficult to efficiently sample from the posterior of the ratings.

I am using the NUTS sampler and have tried:

1.reparameterization (the non-centered version)

turning up the target_accept parameter.
HalfNormal Prior for omega
New variables with priors.

Example Data:

Data Prep Code:

teams = np.sort(np.unique(np.concatenate([h_matches['Team 1 ID'], h_matches['Team 2 ID']])))
maps = obs.Map.unique()
tmap = {v:k for k,v in dict(enumerate(teams)).items()}
mmap = {v:k for k,v in dict(enumerate(maps)).items()}
n_teams = len(teams)
n_maps = len(maps)
print('Number of Teams %i: ' % n_teams)
print('Number of Matches %i: ' % len(obs))
print('Number of Maps %i: '% n_maps)

Here is my original model:

with pm.Model() as rating_model:
    
    omega = pm.InverseGamma('omega', 4,2)
    rating = pm.Normal('rating', 0, omega, shape=n_teams)
    #rating_map = pm.Normal('rating | map', rating, 0.5, shape=(n_maps, n_teams))
    
    diff = rating_map[obs['Map'].map(mmap).values,obs['Team 1 ID'].map(tmap).values] - rating_map[obs['Map'].map(mmap).values,obs['Team 2 ID'].map(tmap).values]
    p = tt.nnet.sigmoid(diff)
    
    err = pm.Bernoulli('observed', p=p, observed=(obs['Team 1 ID'] == obs['winner']).values)


with rating_model:
    trace = pm.sample(10000, init='advi', nuts_kwargs={'target_accept': 0.99}, tune=0)

and non-centered one

with pm.Model() as rating_model:
    
    omega = pm.InverseGamma('omega', 4,2)
    rating = pm.Normal('rating', 0, omega, shape=n_teams)
    theta_tilde = pm.Normal('rate_t', mu=0, sd=1, shape=(n_maps, n_teams))
    rating_map = pm.Deterministic('rating | map', rating + 0.5 * theta_tilde)
    
    diff = rating_map[obs['Map'].map(mmap).values,obs['Team 1 ID'].map(tmap).values] - rating_map[obs['Map'].map(mmap).values,obs['Team 2 ID'].map(tmap).values]
    p = tt.nnet.sigmoid(diff)
    
    err = pm.Bernoulli('observed', p=p, observed=(obs['Team 1 ID'] == obs['winner']).values)


with rating_model:
    trace = pm.sample(10000, init='advi', nuts_kwargs={'target_accept': 0.99}, tune=0)

I get the Warning UserWarning: The acceptance probability in chain 0 does not match the target. It is 0.00052671795651, but should be close to 0.99. Try to increase the number of tuning steps. which indicates that my acceptrance probablility is super low and that I should reparameterize or increase target_accept.

Anything I can do to improve on this model? It currently samples fine for a low number of points but anything above a thousand runs into issues.

aseyboldt · October 20, 2017, 6:19am

Why did you set the number of tuning steps to 0? Because of this it doesn’t change the initial step size, and if that isn’t right you will get a bad acceptance rate. What happens if you just use the defaults for all sampling parameters? ie pm.sample() or maybe pm.sample(1000, njobs=4, tune=1000)? This should also change the init method to something probably more appropriate (jitter+adapt_diag), at least on version 3.2.

kpei · October 20, 2017, 2:37pm

Well I feel stupid now, I always thought the tune parameter just indicated whether to have a burn-in or not. The model now samples well but at a slow iteration/sec. According to @junpenglao, this could mean that the advi initialization could be poor since it speeds up after 500 samples. Using jitter+adapt could potentially solve this for me since ADVI loss hits a minimum and then starts increasing (doesn’t stop)

aseyboldt · October 20, 2017, 3:24pm

Don’t feel bad, reading the docstring I understand how you got that idea. Would this have prevented the confusion?

kpei · October 20, 2017, 3:33pm

Yes I think that makes it much clearer to the reader. I have also read code before that specifically removed the tuning procedure. Thanks.

Topic		Replies	Views
Warning when NUTS probability is greater than acceptance level? Questions	6	8450	November 23, 2017
Metropolis equivalent to NUTS target_accept? Questions	3	1282	August 7, 2018
Slow speed of NUTS in Hierarchical model Questions	14	2083	October 20, 2020
Improving speed with large dimensional problem Questions	12	4016	September 27, 2017
NUTS Sampler: Effective samples is smaller than 200 for some parameters Questions	4	4782	July 6, 2020

What to do with low acceptance probabilities?

Related topics