Low Divergences, high acceptance probability, but low effective sample size: How is this possible?

SaFroeli · June 30, 2021, 6:23pm

New to PyMC3. As far as I understand it, a low effective sample size occurs when a lot of samples are rejected, which should be reflected either in a low acceptance probability, or a large number of deviations. However, I get the following warnings with my model (see below for code):

100.00% [48000/48000 5:00:41<00:00 Sampling 4 chains, 7 divergences]
Sampling 4 chains for 2_000 tune and 10_000 draw iterations (8_000 + 40_000 draws total) took 18042 seconds.
The acceptance probability does not match the target. It is 0.898019663621968, but should be close to 0.95. Try to increase the number of tuning steps.
The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
There were 7 divergences after tuning. Increase `target_accept` or reparameterize.
The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.

As you can see, the acceptance probability is almost 90%, there are only 7 divergences, and yet the effective sample size is low. In fact, it is as low as 4 or 5 for most parameters. How is this possible, and how is this not at odds with the decent acceptance probability and the low number of divergences?

Code

# Hierarchical Logistic Regression Model - Noncentered without increased target_accept
idxt = pd.Categorical(data['ID']).codes
idxs = pd.Categorical(data['ID'].unique()).codes

# Convert response variable to categorical variable for logistic regression model
y0 = pd.Categorical(data['response_stage1']).codes

# Define model
with pm.Model() as model_1_hierarchy_noncentered:
    ## Define group-level priors
    b0mu = pm.Normal('b0mu', mu = 0, sd = 10, shape = data['ID'].unique().size)
    b0sd = pm.HalfNormal('b0sd', sd = 10, shape = data['ID'].unique().size)

    b1mu = pm.Normal('b1mu', mu = 0, sd = 10, shape = data['ID'].unique().size)
    b1sd = pm.HalfNormal('b1sd', sd = 10, shape = data['ID'].unique().size)

    b2mu = pm.Normal('b2mu', mu = 0, sd = 10, shape = data['ID'].unique().size)
    b2sd = pm.HalfNormal('b2sd', sd = 10, shape = data['ID'].unique().size)

    b3mu = pm.Normal('b3mu', mu = 0, sd = 10, shape = data['ID'].unique().size)
    b3sd = pm.HalfNormal('b3sd', sd = 10, shape = data['ID'].unique().size)

    ## Define Priors over parameters
    beta0 = b0mu[idxs] + pm.Normal('beta0', mu=0, sd=1, shape = len(idxs))*b0sd[idxs]
    beta1 = b1mu[idxs] + pm.Normal('beta1', mu=0, sd=1, shape = len(idxs))*b1sd[idxs]
    beta2 = b2mu[idxs] + pm.Normal('beta2', mu=0, sd=1, shape = len(idxs))*b2sd[idxs]
    beta3 = b3mu[idxs] + pm.Normal('beta3', mu=0, sd=1, shape = len(idxs))*b3sd[idxs]

    mu = beta0[idxt] + pm.math.dot(data['prev_stage1_action'], beta1[idxt]) + pm.math.dot(data['prev_R'], beta2[idxt]) + pm.math.dot(data['prev_trans'], beta3[idxt])

    theta = pm.math.sigmoid(mu)

    ## Define Likelihood function
    yl = pm.Bernoulli('yl', theta, observed=y0)

    trace = pm.sample(10000, cores = 4, tune = 2000, target_accept=0.95)

cluhmann · June 30, 2021, 7:32pm

Low effective sample size typically comes from autocorrelated samples/chains. Acceptance probability, divergences, and ESS all give you different perspectives on what happened during sampling. They aren’t redundant. Also, 7 divergences isn’t great. For a deep dive into sampling diagnostics, I would highly reocmmend Aki’s keynote from last year’s PyMCon

Topic		Replies	Views
How to increase the effective sample size? Questions	0	602	May 9, 2020
The number of effective samples is smaller than 25% for some parameters Questions	4	11265	April 5, 2018
Decreasing rhat for Black-Box Likelihood Model v3 modeling	7	850	October 17, 2022
Multivariate Hypergeometric model has very low number of effective samples Questions	4	1211	April 22, 2019
Mixture Gaussian mode dose not converges with high iterations Questions	0	285	November 5, 2021

Low Divergences, high acceptance probability, but low effective sample size: How is this possible?

Related topics