Mixture of hierarchical model

Hi,

I tried as you suggested without any luck (results attached at the end). Do you have other ideas?

In alternative, do you think that I could just model the data using a simple varying intercept and slope model (https://docs.pymc.io/notebooks/multilevel_modeling.html) and then say something about the two clusters a posteriori? In this case, can you suggest a method?

Results of the suggested modifications:

k=2
x_t = time[:,np.newaxis]
x_time = shared(x_t,broadcastable=(False,True))
with pm.Model() as varying_intercept:

    # Priors
    mu_a = pm.Normal('mu_a', mu=0., sd=0.1,shape=k)
    sigma_a = pm.HalfCauchy('sigma_a', 5)
    mu_b = pm.Normal('mu_b', mu=0., sd=0.1,shape=k,testval=np.asarray([-.1,.1]))
    sigma_b = pm.HalfCauchy('sigma_b', 5)

    order_means_potential = pm.Potential('order_means_potential',
                                          tt.switch(mu_b[1]-mu_b[0] < 0, -np.inf, 0))

    # Random intercepts
    a = pm.Normal('a', mu=mu_a, sd=sigma_a, shape=(n_sbj,k))
    b = pm.Normal('b',mu=mu_b,sd=sigma_b,shape=(n_sbj,k))

    # Model error
    sd_y = pm.HalfCauchy('sd_y', 5)

    # Expected value
    y_hat = a[subjects,:] + b[subjects,:] * x_time

    # Data likelihood   
    p = pm.Dirichlet('p', np.ones(k))
    likelihood = pm.NormalMixture('likelihood', p, y_hat, sd=sd_y, observed=observed)

    varying_intercept_trace = pm.sample(5000, n_init=50000, tune=1000,init='adapt_diag')[1000:]    

Auto-assigning NUTS sampler…
Initializing NUTS using adapt_diag…
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [p_stickbreaking__, sd_y_log__, b, a, sigma_b_log__, mu_b, sigma_a_log__, mu_a]
100%|██████████| 6000/6000 [00:58<00:00, 102.81it/s]
There were 2997 divergences after tuning. Increase target_accept or reparameterize.
There were 4073 divergences after tuning. Increase target_accept or reparameterize.
There were 2945 divergences after tuning. Increase target_accept or reparameterize.
There were 2646 divergences after tuning. Increase target_accept or reparameterize.
The gelman-rubin statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.