How to do model comparison with a dummy variable

Hi, sorry for getting back to this so much later.

I finally tried to apply the pm.Mixture() approach to my model, and I’m running into similar problems as with a naive mixture (where the dummy variable is not marginalized). After increasing target_accept to 0.9, I don’t get divergences, but sampling is extremely slow and the chains still get stuck in different regions. Is there something wrong with the model or is marginalizing out the dummy variable not sufficient in a case like this?

Here is the full model specification:

with pm.Model() as model_mixture:
    # model weights
    model_weights = pm.Dirichlet("model_weights", np.ones(3))

    # model 1: global parameter
    theta_global = pm.Beta("theta_global", 0.5, 0.5)

    # model 2: parameters per voice
    theta_voice = pm.Beta("theta_voice", np.full(4, 0.5), np.full(4, 0.5))

    # model 3: parameter depends on preceding pitch
    a = pm.Normal("a", 0, 10)
    b = pm.Normal("b", 0, 10)
    theta_register = pm.math.sigmoid(p0*a + b)

    # mixture components
    components = [
        pm.Geometric.dist(theta_global),
        pm.Geometric.dist(theta_voice[staff]),
        pm.Geometric.dist(theta_register),
    ]

    # observation
    pm.Mixture("obs", w=model_weights, comp_dists=components, observed=observations+1)

    idata_mixture = pm.sample(1000, chains=4, target_accept=0.9)
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 4882 seconds.
The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details
The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details

And here is the trace:

Would be curious to hear what you think about this.