Hierarchical Gaussian Mixture Regression Model


Help needed in modelling

Implementing a multilevel model as follows:

Level 1 :

y = intercept + x * slope + error

Level 2 :

slope represents Gaussian Mixture Model

=> GMM(slope | mean, S.D)

with pm.Model() as varying_intercept:
# cluster sizes
p = pm.Dirichlet('p', np.ones(k))

mu_a = pm.Normal('mu_a', mu=0, sd=0.1, shape=k)  # intercept
mu_b = pm.Normal('mu_b', mu=0, sd=0.1, shape=k)  # slope

sd_a = pm.Normal('sd_a', 2, shape=k)
sd_b = pm.Normal('sd_b', 2, shape=k)

intercept = pm.NormalMixture('intercept', p, mu_a, sd=sd_a, shape=unique_subj)
slope = pm.NormalMixture('slope', p, mu_b, sd=sd_b, shape=unique_subj)

# Expected value
y_hat = intercept[id] + slope[id] * x_t

# Model error
sd_y = pm.Normal('sd_y', 10)

# Data likelihood
y_like = pm.Normal('y_like', mu=y_hat, sd=sd_y, observed=observed)
  1. Would like to know whether the written model is correct.
  2. Model has problems in converging.


A couple little things:

  1. For the standard deviations, use pm.HalfNormal in place of pm.Normal

  2. You shouldn’t need an extra level for error; the variances are already part of the GMM

  3. Right now, the slopes and intercepts are being treated as a mixture; instead of the y-values

  4. The cluster weights are not being treated hierarchically

For sharing weights I might do something like:

def logp_alpha(a, l=0.5):
    return tt.log(tt.exp(-l * a))

with pm.Model() as linear_gmm:
  # prior on mixtures: try to be uninformative, but penalize very large values of ||a||
  alpha_prior = pm.Flat('alpha_prior', shape=n_clusters)
  pm.Potential('p(alpha)', logp_alpha(alpha))


Oh, I see. You want the slope and intercept to be like a mixed model, for data that looks something like this:


You’re nearly there; but I wouldn’t treat the intercepts and slopes as separate mixture models, but rather I’d treat the “mean” of a cluster as the predicted response for that cluster. That is, each cluster is a “line”, and the standard-deviation is the residual deviation.

I put the intercept term and x into a design matrix X and it looks something like:

with pm.Model() as mod:
    # priors
    beta_prior = pm.Normal('bprior', 0., 5., shape=(2,))
    beta_se = pm.HalfNormal('bse', 5., shape=(2,))
    # parameters
    weights = pm.Dirichlet('weights', a=np.array([2.]*k))
    # non-centered parameterization
    beta_raw = pm.Normal('beta_raw', 0., 1., shape=(k, 2))
    beta = pm.Deterministic('beta', beta_raw*beta_se + beta_prior)
    cluster_means = pm.Deterministic('cl_means', tt.dot(X, tt.transpose(beta)))  # (n x 2) x (2 x k)
    cluster_sds = pm.HalfNormal('cl_sds', 1., shape=(k,))
    lik = pm.NormalMixture('y', w=weights, mu=cluster_means, sd=cluster_sds, observed=y)
    res = pm.sample(350, init='jitter+adapt_diag', tune=500, chains=3, cores=1, nuts_kwargs={'max_treedepth': 25})

The true betas are

intercepts = np.array([-3.14, 0.151, 2.681])
slopes = np.array([3.1, -1.2, 0.50])

so basically spot on. Posterior:



Thanks for your reply.

The dataset has longitudinal data. The data looks something like thisdata.csv (97.1 KB)

Data Description and Model :

  1. The subjects follows a linear trend.
  2. I would like to fit a linear model in level 1 for each subject and obtain the slope and intercept and pass it to level 2 model
  3. In level 2, The Gaussian mixture clusters the slope and intercept obtained from the level 1.


I think you’ll find that’s what the model does. Each sample is treated as having an intercept or slope from one of k clusters (with unknown mean and (co)variance).

Your model draws each slope/intercept independently for each sample from the mixture model. The way you would include this in a hierarchical model (with replicates) is by allowing offsets for each of the individuals from each of the clusters. This can quickly lead to over-parameterization, since there are an additional n \times k parameters.

If you have the N \times n indicator matrix mapping observations to sample identifiers sample_ids you can do this:

sample_beta_offsets = pm.Normal('per_sample_offsets', 0., 0.1, shape=(n, k, 2))
    beta_sample = pm.Deterministic('per_sample_beta', tt.transpose(tt.dot(tt.transpose(sample_beta_offsets), sample_ids.T)) + beta)
    # Each sample has its own (slope, int) for each cluster
    # (N, k, 2)
    # observe (N, 2) values X
    # Multiplying the values for a particular sample gives the cluster mean
    # (i, k, 2) * X[i,:] = mean_i
    cluster_means = pm.Deterministic('cl_means', tt.stack([tt.dot(beta_sample[i,:,:], X[i,:]) for i in range(sample_ids.shape[0])]))

but it’s really painful…



Mea culpa.

Your model is basically correct. It only fails to sample because of using Normals rather than HalfNormals on the standard deviations.

Test data:

import numpy as np
import seaborn as sbn
import pymc3 as pm
import theano.tensor as tt
import pandas as pd

cluster_props = [0.25, 0.75]
n_clusters = len(cluster_props)
n_samples = 80


slopes = np.random.standard_t(1., size=(n_clusters,)) * 3
intercepts = np.random.standard_t(1., size=(n_clusters,)) * 5

def mksample(n_meas):
    measurements = np.random.uniform(low=0., high=10., size=(n_meas,))
    clust = np.where(np.random.multinomial(1, pvals=cluster_props) > 0.1)[0]
    clust2 = np.where(np.random.multinomial(1, pvals=cluster_props) > 0.1)[0]
    slope, int_ = slopes[clust] + np.random.normal(0.1), intercepts[clust2] + np.random.normal(0.1)
    y = int_ + slope * measurements + np.random.normal(0, 0.5, size=(n_meas,))
    return pd.DataFrame({'x': measurements, 'y': y})

data = {'sample_%d' % (1+i): mksample(1 + np.random.binomial(9, 0.5)) for i in range(n_samples)}



with pm.Model() as hm_model:
    # priors
    weight_prior_s = pm.Dirichlet('wps', 2. * np.ones(k))
    weight_prior_i = pm.Dirichlet('wpi', 2. * np.ones(k))
    clust_slope_means = pm.Normal('mu_slope', 0., 5., shape=(k,))
    clust_int_means = pm.Normal('mu_int', 0., 5., shape=(k,))
    slope_offset_err = pm.HalfNormal('se_slope', 2.)
    int_offset_err = pm.HalfNormal('se_int', 2.)
    resid_err = pm.HalfNormal('resid_err', 1.)
    # the mixture model
    slope_prior = pm.NormalMixture('slope_mix', w=weight_prior_s, mu=clust_slope_means, sd=slope_offset_err, shape=(len(data),))
    int_prior = pm.NormalMixture('int_mix', w=weight_prior_i, mu=clust_int_means, sd=int_offset_err, shape=(len(data),))
    # the observed data
    for i, (sample, samdat) in enumerate(data.items()):
        sam_slope = slope_prior[i]
        sam_int = int_prior[i]
        lik_sam = pm.Normal('%s_obs' % sample, sam_int + sam_slope * samdat['x'], resid_err, observed=samdat['y'])
    res = pm.sample(150, chains=4, cores=2, nuts_kwargs={'target_accept': 0.8, 'max_treedepth': 10})

It samples quickly and well:

The slope_mix trace has a line that looks weird and all-over-the-place; but this comes from a sample with only one observation:


In sum: The following could be occurring for your model

  1. Bad parametrization of standard errors (use HalfNormal instead)
  2. Poor identifiability (no clear clusters for mean/slope; or incorrect # of clusters)
  3. Overparametrization (insufficient # of samples, or measurements per sample for all the parameters)