Hi, sorry for getting back to this so much later.
I finally tried to apply the pm.Mixture() approach to my model, and I’m running into similar problems as with a naive mixture (where the dummy variable is not marginalized). After increasing target_accept to 0.9, I don’t get divergences, but sampling is extremely slow and the chains still get stuck in different regions. Is there something wrong with the model or is marginalizing out the dummy variable not sufficient in a case like this?
Here is the full model specification:
with pm.Model() as model_mixture:
# model weights
model_weights = pm.Dirichlet("model_weights", np.ones(3))
# model 1: global parameter
theta_global = pm.Beta("theta_global", 0.5, 0.5)
# model 2: parameters per voice
theta_voice = pm.Beta("theta_voice", np.full(4, 0.5), np.full(4, 0.5))
# model 3: parameter depends on preceding pitch
a = pm.Normal("a", 0, 10)
b = pm.Normal("b", 0, 10)
theta_register = pm.math.sigmoid(p0*a + b)
# mixture components
components = [
pm.Geometric.dist(theta_global),
pm.Geometric.dist(theta_voice[staff]),
pm.Geometric.dist(theta_register),
]
# observation
pm.Mixture("obs", w=model_weights, comp_dists=components, observed=observations+1)
idata_mixture = pm.sample(1000, chains=4, target_accept=0.9)
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 4882 seconds.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters. A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details
And here is the trace:
Would be curious to hear what you think about this.
