Hi all,
I have a dataset composed of 2 conditions, each condition consists of 3 samples, each sample subdivided into different categories. I would like to compute 95% credible intervals for the proportions of each category within each condition, to be able to understand if some of the categories have non-overlapping credible intervals between conditions. Ideally, I want to use a barplot with error bars downstream, where the error bars represent the 95% credible interval (see attached figure).
First of all, the categories in each sample are not independent (if I have more counts in one, I have fewer counts in the others - the dependence of the proportions follows so I cannot use the Binomial proportion confidence interval).
Can I use the 95% credible interval in this case? With dependent proportions?
However, I have a model defined as following:
with pm.Model() as model_0:
v_g = pm.Gamma('v_g', 1, .1)
a_gb = pm.Gamma('a_gb', 1, v_g, shape=len(classes))
phi_gb = pm.Dirichlet(
f'phi_g.b',
a=a_gb,
shape=(len(conditions), len(classes)),
)
for b in batches:
g = {
c: i for i, c in enumerate(conditions)
}[bg_mapping[b]]
n_gb = pm.Multinomial(
f'n_{g}.{b}',
n=n[b],
p=phi_gb[g],
observed=observed.loc[b],
)
trace = pm.sample(50000, tune=10000)
where classes are the categories, batches are the samples.
The sampling of this model is probably very slow (~30 mins on my laptop) with the NUTS sampler. I also used Metropolis that took less time. However, I’m unsure which sampler would be the best for this model.
After execution with NUTS I get the following information:
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs) NUTS: [phi_g.b, a_gb, v_g]
Sampling 2 chains, 262 divergences: 100%|██████████| 120000/120000 [30:56<00:00, 64.64draws/s]
There were 37 divergences after tuning. Increase `target_accept` or reparameterize.
There were 225 divergences after tuning. Increase `target_accept` or reparameterize.
The acceptance probability does not match the target. It is 0.6792032892250478, but should be close to 0.8. Try to increase the number of tuning steps.
The number of effective samples is smaller than 10% for some parameters.
As you can see there are a lot of warnings, what might be the problem in this case?
Please let me know which additional information I can provide.
Thanks