New to PyMC3. As far as I understand it, a low effective sample size occurs when a lot of samples are rejected, which should be reflected either in a low acceptance probability, or a large number of deviations. However, I get the following warnings with my model (see below for code):
100.00% [48000/48000 5:00:41<00:00 Sampling 4 chains, 7 divergences]
Sampling 4 chains for 2_000 tune and 10_000 draw iterations (8_000 + 40_000 draws total) took 18042 seconds.
The acceptance probability does not match the target. It is 0.898019663621968, but should be close to 0.95. Try to increase the number of tuning steps.
The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
There were 7 divergences after tuning. Increase `target_accept` or reparameterize.
The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
The chain reached the maximum tree depth. Increase max_treedepth, increase target_accept or reparameterize.
The rhat statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.
As you can see, the acceptance probability is almost 90%, there are only 7 divergences, and yet the effective sample size is low. In fact, it is as low as 4 or 5 for most parameters. How is this possible, and how is this not at odds with the decent acceptance probability and the low number of divergences?
Code
# Hierarchical Logistic Regression Model - Noncentered without increased target_accept
idxt = pd.Categorical(data['ID']).codes
idxs = pd.Categorical(data['ID'].unique()).codes
# Convert response variable to categorical variable for logistic regression model
y0 = pd.Categorical(data['response_stage1']).codes
# Define model
with pm.Model() as model_1_hierarchy_noncentered:
## Define group-level priors
b0mu = pm.Normal('b0mu', mu = 0, sd = 10, shape = data['ID'].unique().size)
b0sd = pm.HalfNormal('b0sd', sd = 10, shape = data['ID'].unique().size)
b1mu = pm.Normal('b1mu', mu = 0, sd = 10, shape = data['ID'].unique().size)
b1sd = pm.HalfNormal('b1sd', sd = 10, shape = data['ID'].unique().size)
b2mu = pm.Normal('b2mu', mu = 0, sd = 10, shape = data['ID'].unique().size)
b2sd = pm.HalfNormal('b2sd', sd = 10, shape = data['ID'].unique().size)
b3mu = pm.Normal('b3mu', mu = 0, sd = 10, shape = data['ID'].unique().size)
b3sd = pm.HalfNormal('b3sd', sd = 10, shape = data['ID'].unique().size)
## Define Priors over parameters
beta0 = b0mu[idxs] + pm.Normal('beta0', mu=0, sd=1, shape = len(idxs))*b0sd[idxs]
beta1 = b1mu[idxs] + pm.Normal('beta1', mu=0, sd=1, shape = len(idxs))*b1sd[idxs]
beta2 = b2mu[idxs] + pm.Normal('beta2', mu=0, sd=1, shape = len(idxs))*b2sd[idxs]
beta3 = b3mu[idxs] + pm.Normal('beta3', mu=0, sd=1, shape = len(idxs))*b3sd[idxs]
mu = beta0[idxt] + pm.math.dot(data['prev_stage1_action'], beta1[idxt]) + pm.math.dot(data['prev_R'], beta2[idxt]) + pm.math.dot(data['prev_trans'], beta3[idxt])
theta = pm.math.sigmoid(mu)
## Define Likelihood function
yl = pm.Bernoulli('yl', theta, observed=y0)
trace = pm.sample(10000, cores = 4, tune = 2000, target_accept=0.95)