Mixture of hierarchical model

Hello,

the data I am trying to model is composed of behavioural scores from 20 subjects. For each subject, we collected approximately 15 trials before treatment and 15 trials after treatment. In the following image, each dot represents the average score of each subject:

Unfortunately, the data are proprietary and can’t be shared, but I will try my best to characterise the problem.

As you can see, subjects with a pre-treatment score above 0.011 have a negative slope (let’s call this “decreasing group”), while subjects below that threshold have a positive slope (let’s call this “increasing group”).

I have successfully modelled the data using a hierarchical model with a varying intercept and slope model:
https://docs.pymc.io/notebooks/multilevel_modeling.html

However, I would like to have a model that automatically clusters subjects in the groups mentioned above (i.e. decreasing/increasing).

For this purpose I am trying to fit a “mixture of hierarchical models” to this data:

``````k=2
# observed: vector of observations
# x_time: for each observation 0 for pre-treatment, 1 for post-treatment
# subjects: for each observation contains the subject ID
with Model() as model:
# cluster sizes
p = pm.Dirichlet('p', np.ones(k))
# ensure all clusters have some points
p_min_potential = pm.Potential('p_min_potential', tt.switch(tt.min(p) < .1, -np.inf, 0))

# latent cluster of each subject
category = pm.Categorical('category',
p=p,
shape=n_sbj)

# Priors
mu_a = Normal('mu_a', mu=0., sd=1e5,shape=k)
sigma_a = HalfCauchy('sigma_a', 5)
mu_b = Normal('mu_b', mu=0., sd=1e5,shape=k)
sigma_b = HalfCauchy('sigma_b', 5)

# Intercepts ntercepts
a = Normal('a', mu=mu_a[category], sd=sigma_a, shape=n_sbj)

# Slopes
b = pm.Normal('b',mu=mu_b[category],sd=sigma_b,shape=n_sbj)

# Model error
sd_y = HalfCauchy('sd_y', 5)

# Expected value
y_hat = a[subjects] + b[subjects] * x_time

# Data likelihood
y_like = Normal('y_like', mu=y_hat, sd=sd_y, observed=observed)
``````

The sampling returns several warnings:

Multiprocess sampling (4 chains in 4 jobs)
CompoundStep
NUTS: [sd_y_log__, b, a, sigma_b_log__, mu_b, sigma_a_log__, mu_a, p_stickbreaking__]
BinaryGibbsMetropolis: [category]
100%|██████████| 6000/6000 [00:40<00:00, 149.02it/s]
There were 5 divergences after tuning. Increase `target_accept` or reparameterize.
There were 1 divergences after tuning. Increase `target_accept` or reparameterize.
There were 3 divergences after tuning. Increase `target_accept` or reparameterize.
The gelman-rubin statistic is larger than 1.4 for some parameters. The sampler did not converge.

And the traces look like this:

And the forestplot like this:

Do you have any idea how to fix the model?
Or if you think that there is a better way of modelling this data, please let me know.

There are quite a few discussion here on how to improve sampling of mixture models, eg:
Properly sampling mixture models. In your case, you should try to:

1. rewrite the model using `pm.Mixture`
A latent mixture is usually better as you can use NUTS to sample all parameters - otherwise mixing Gibbs Sampler and NUTS might not always works.

2. restrict the order of the slopes
In this way you limit the mode switching problem, see How to improve fit of Poisson Mixture Model in PyMC3?

3. better prior
More informative prior for `mu_a` and `mu_b` would also help.

1 Like

I am trying to rewrite the model using pm.Mixtures, by following other discussions:

However, I find it difficult to handle the fact that observations are “nested” in subjects. I tried several models (attached only two, i.e. Model 1 and Model2), but none works. Could you be a bit more specific about the way of rewriting the model?

Thanks.

Model 1

``````k=2
#x_t = time
#x_time = shared(x_t)
x_t = time[:,np.newaxis]
with Model() as varying_intercept:
# Priors
mu_a = Normal('mu_a', mu=0., sd=0.1,shape=k)
sigma_a = HalfCauchy('sigma_a', 5)
mu_b = Normal('mu_b', mu=0., sd=0.1,shape=k)
sigma_b = HalfCauchy('sigma_b', 5)

# Random intercepts
a = Normal('a', mu=mu_a, sd=sigma_a, shape=(n_sbj,k))
b = pm.Normal('b',mu=mu_b,sd=sigma_b,shape=(n_sbj,k))

# Model error
sd_y = HalfCauchy('sd_y', 5)

# Expected value
y_hat = a[subjects,:] + b[subjects,:] * x_time

# Data likelihood
p = pm.Dirichlet('p', np.ones(k))
likelihood = pm.NormalMixture('likelihood', p, y_hat, sd=sd_y, observed=observed)
``````

Auto-assigning NUTS sampler…
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [p_stickbreaking__, sd_y_log__, b, a, sigma_b_log__, mu_b, sigma_a_log__, mu_a]
100%|██████████| 6000/6000 [01:26<00:00, 69.28it/s]
The gelman-rubin statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.

Model 2

``````k=2
x_t = time
x_time = shared(x_t)
with Model() as varying_intercept:

# cluster sizes
p = pm.Dirichlet('p', np.ones(k))

mu_a = pm.Normal('mu_a', mu=0, sd=0.1, shape=k) # intercept
mu_b = pm.Normal('mu_b', mu=0, sd=0.1, shape=k) # slope

sd_a = HalfCauchy('sd_a', 5,shape=k)
sd_b = HalfCauchy('sd_b', 5,shape=k)

a = pm.NormalMixture('mix_mu_a', p, mu_a, sd=sd_a,shape=n_sbj)
b = pm.NormalMixture('mix_mu_b', p, mu_b, sd=sd_b,shape=n_sbj)

# Expected value
y_hat = a[subjects] + b[subjects] * x_time

#Model error
sd_y = HalfCauchy('sd_y', 5)

# Data likelihood
y_like = Normal('y_like', mu=y_hat, sd=sd_y, observed=observed)
``````

Auto-assigning NUTS sampler…
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [sd_y_log__, mix_mu_b, mix_mu_a, sd_b_log__, sd_a_log__, mu_b, mu_a, p_stickbreaking__]
100%|██████████| 6000/6000 [00:37<00:00, 160.68it/s]
There were 1 divergences after tuning. Increase `target_accept` or reparameterize.
The gelman-rubin statistic is larger than 1.4 for some parameters. The sampler did not converge.
The estimated number of effective samples is smaller than 200 for some parameters.

model 1 is definitely in the right direction. However, you should still try to restrict the coefficient to make sure there is no mode switching. As a quick fix try adding this constraint into your model1:

``````    # break symmetry
order_means_potential = pm.Potential('order_means_potential',
tt.switch(mu_b[1]-mu_b[0] < 0, -np.inf, 0))
``````

Also, you might find some inspiration from this question:

I had two notebook that explore different parameterization and inference:

Hi thanks for the quick replies.

but I got

ValueError: Bad initial energy: inf. The model might be misspecified.

With respect to your second suggestion

I am not sure where I should get inspiration from. Are you suggesting to “switch label by hand”?

Try `pm.sample(init='adapt_diag')`, and also providing a test_value to `mu_b`: `mu_b = Normal('mu_b', mu=0., sd=0.1, shape=k, testval=np.asarray([-.1,.1]))`

Those two notebooks are mixture regressions, sorry there is not many comments but you can see different ways of modelling it.

Hi,

I tried as you suggested without any luck (results attached at the end). Do you have other ideas?

In alternative, do you think that I could just model the data using a simple varying intercept and slope model (https://docs.pymc.io/notebooks/multilevel_modeling.html) and then say something about the two clusters a posteriori? In this case, can you suggest a method?

Results of the suggested modifications:

``````k=2
x_t = time[:,np.newaxis]
with pm.Model() as varying_intercept:

# Priors
mu_a = pm.Normal('mu_a', mu=0., sd=0.1,shape=k)
sigma_a = pm.HalfCauchy('sigma_a', 5)
mu_b = pm.Normal('mu_b', mu=0., sd=0.1,shape=k,testval=np.asarray([-.1,.1]))
sigma_b = pm.HalfCauchy('sigma_b', 5)

order_means_potential = pm.Potential('order_means_potential',
tt.switch(mu_b[1]-mu_b[0] < 0, -np.inf, 0))

# Random intercepts
a = pm.Normal('a', mu=mu_a, sd=sigma_a, shape=(n_sbj,k))
b = pm.Normal('b',mu=mu_b,sd=sigma_b,shape=(n_sbj,k))

# Model error
sd_y = pm.HalfCauchy('sd_y', 5)

# Expected value
y_hat = a[subjects,:] + b[subjects,:] * x_time

# Data likelihood
p = pm.Dirichlet('p', np.ones(k))
likelihood = pm.NormalMixture('likelihood', p, y_hat, sd=sd_y, observed=observed)

``````

Auto-assigning NUTS sampler…
There were 2997 divergences after tuning. Increase `target_accept` or reparameterize.
There were 4073 divergences after tuning. Increase `target_accept` or reparameterize.
There were 2945 divergences after tuning. Increase `target_accept` or reparameterize.
There were 2646 divergences after tuning. Increase `target_accept` or reparameterize.