the data I am trying to model is composed of behavioural scores from 20 subjects. For each subject, we collected approximately 15 trials before treatment and 15 trials after treatment. In the following image, each dot represents the average score of each subject:
Unfortunately, the data are proprietary and can’t be shared, but I will try my best to characterise the problem.
As you can see, subjects with a pre-treatment score above 0.011 have a negative slope (let’s call this “decreasing group”), while subjects below that threshold have a positive slope (let’s call this “increasing group”).
I have successfully modelled the data using a hierarchical model with a varying intercept and slope model:
However, I would like to have a model that automatically clusters subjects in the groups mentioned above (i.e. decreasing/increasing).
For this purpose I am trying to fit a “mixture of hierarchical models” to this data:
k=2 # observed: vector of observations # x_time: for each observation 0 for pre-treatment, 1 for post-treatment # subjects: for each observation contains the subject ID with Model() as model: # cluster sizes p = pm.Dirichlet('p', np.ones(k)) # ensure all clusters have some points p_min_potential = pm.Potential('p_min_potential', tt.switch(tt.min(p) < .1, -np.inf, 0)) # latent cluster of each subject category = pm.Categorical('category', p=p, shape=n_sbj) # Priors mu_a = Normal('mu_a', mu=0., sd=1e5,shape=k) sigma_a = HalfCauchy('sigma_a', 5) mu_b = Normal('mu_b', mu=0., sd=1e5,shape=k) sigma_b = HalfCauchy('sigma_b', 5) # Intercepts ntercepts a = Normal('a', mu=mu_a[category], sd=sigma_a, shape=n_sbj) # Slopes b = pm.Normal('b',mu=mu_b[category],sd=sigma_b,shape=n_sbj) # Model error sd_y = HalfCauchy('sd_y', 5) # Expected value y_hat = a[subjects] + b[subjects] * x_time # Data likelihood y_like = Normal('y_like', mu=y_hat, sd=sd_y, observed=observed)
The sampling returns several warnings:
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sd_y_log__, b, a, sigma_b_log__, mu_b, sigma_a_log__, mu_a, p_stickbreaking__]
100%|██████████| 6000/6000 [00:40<00:00, 149.02it/s]
There were 5 divergences after tuning. Increase
There were 1 divergences after tuning. Increase
There were 3 divergences after tuning. Increase
The gelman-rubin statistic is larger than 1.4 for some parameters. The sampler did not converge.
And the traces look like this:
And the forestplot like this:
Do you have any idea how to fix the model?
Or if you think that there is a better way of modelling this data, please let me know.
Thank you in advance.