Low Divergences, high acceptance probability, but low effective sample size: How is this possible?

New to PyMC3. As far as I understand it, a low effective sample size occurs when a lot of samples are rejected, which should be reflected either in a low acceptance probability, or a large number of deviations. However, I get the following warnings with my model (see below for code):

100.00% [48000/48000 5:00:41<00:00 Sampling 4 chains, 7 divergences]
Sampling 4 chains for 2_000 tune and 10_000 draw iterations (8_000 + 40_000 draws total) took 18042 seconds.
The acceptance probability does not match the target. It is 0.898019663621968, but should be close to 0.95. Try to increase the number of tuning steps.
The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
There were 7 divergences after tuning. Increase `target_accept` or reparameterize.
The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.

As you can see, the acceptance probability is almost 90%, there are only 7 divergences, and yet the effective sample size is low. In fact, it is as low as 4 or 5 for most parameters. How is this possible, and how is this not at odds with the decent acceptance probability and the low number of divergences?

Code

# Hierarchical Logistic Regression Model - Noncentered without increased target_accept
idxt = pd.Categorical(data['ID']).codes
idxs = pd.Categorical(data['ID'].unique()).codes

# Convert response variable to categorical variable for logistic regression model
y0 = pd.Categorical(data['response_stage1']).codes

# Define model
with pm.Model() as model_1_hierarchy_noncentered:
    ## Define group-level priors
    b0mu = pm.Normal('b0mu', mu = 0, sd = 10, shape = data['ID'].unique().size)
    b0sd = pm.HalfNormal('b0sd', sd = 10, shape = data['ID'].unique().size)

    b1mu = pm.Normal('b1mu', mu = 0, sd = 10, shape = data['ID'].unique().size)
    b1sd = pm.HalfNormal('b1sd', sd = 10, shape = data['ID'].unique().size)

    b2mu = pm.Normal('b2mu', mu = 0, sd = 10, shape = data['ID'].unique().size)
    b2sd = pm.HalfNormal('b2sd', sd = 10, shape = data['ID'].unique().size)

    b3mu = pm.Normal('b3mu', mu = 0, sd = 10, shape = data['ID'].unique().size)
    b3sd = pm.HalfNormal('b3sd', sd = 10, shape = data['ID'].unique().size)

    ## Define Priors over parameters
    beta0 = b0mu[idxs] + pm.Normal('beta0', mu=0, sd=1, shape = len(idxs))*b0sd[idxs]
    beta1 = b1mu[idxs] + pm.Normal('beta1', mu=0, sd=1, shape = len(idxs))*b1sd[idxs]
    beta2 = b2mu[idxs] + pm.Normal('beta2', mu=0, sd=1, shape = len(idxs))*b2sd[idxs]
    beta3 = b3mu[idxs] + pm.Normal('beta3', mu=0, sd=1, shape = len(idxs))*b3sd[idxs]

    mu = beta0[idxt] + pm.math.dot(data['prev_stage1_action'], beta1[idxt]) + pm.math.dot(data['prev_R'], beta2[idxt]) + pm.math.dot(data['prev_trans'], beta3[idxt])

    theta = pm.math.sigmoid(mu)

    ## Define Likelihood function
    yl = pm.Bernoulli('yl', theta, observed=y0)

    trace = pm.sample(10000, cores = 4, tune = 2000, target_accept=0.95)

Low effective sample size typically comes from autocorrelated samples/chains. Acceptance probability, divergences, and ESS all give you different perspectives on what happened during sampling. They aren’t redundant. Also, 7 divergences isn’t great. For a deep dive into sampling diagnostics, I would highly reocmmend Aki’s keynote from last year’s PyMCon

1 Like