Mixture model with variable number of component distributions

Hi Pymc3 community,

I am trying to fit a mixture model, where each observation (s) is sampled from a number of component distributions. The number of component distributions is unknown and to be estimated.
The possible component distributions are one Uniform(0, 25) (to model contaminates) and an unknown number (z) of Normal distributions, where z can be one of {0,1,2,3,4,5}.

I am trying to infer:

  • The number of normal distributions: z
  • The mean of each of the normal distributions: gss
  • A number of standard deviations (```r_E``) associated with the normal distributions, which are shared across observations.
  • The mapping from each data point to each component distribution: w, where w <= z

The model seems to produce reasonable estimates of P(gss | z) but z rarely changes. When I print z, I can see different values being proposed, but the trace contains a constant value for each chain, which leads me to believe that alternate values of z are always rejected.

I also have a lot of divergences and some bad Gelman Rubin stats.

Code and Data.


  • Can i parameterize my model differently for more effective sampling/less divergences?
  • How can i sample from z more effectively?
  • Can i/should i use different samplers?

Ive also posted the code here:

with pm.Model() as model:

    # True s
    gss = pm.Uniform('gss', lower=0., upper=25., shape=(1,max_s),
    gss_per_obs = gss[data['epoch_i'],:] #broadcast, 1 for each obs

    # The number of gss
    gss_prior = pm.Dirichlet('gss_prior', a=s_number_prior)
    z = pm.Categorical('z', p=gss_prior).reshape((1,1)) #change shape to fix theano reshape error
    z_per_obs = z[data['epoch_i']] #broadcast, 1 for each obs

    w_prior_possibilities = tt.tril(tt.ones((max_s + 1, max_s + 1)))
    w = pm.Categorical('w', p=w_prior_possibilities[z_per_obs[:,0],:], shape=n_obs)

    # --- Raters ability to detect gss --- #
    r_E = pm.Bound(pm.Normal, lower=0.)('r_E', mu=0.5, sd=0.5, shape=n_raters)
    r_E_per_obs = r_E[data['rater_i']]

    # --- Behaviour --- #
    contaminate_dist_s = pm.Uniform.dist(lower=0., upper=25., shape=n_obs)
    contaminate_dist_s.mean = 12.5
    possible_dists = [contaminate_dist_s]
    for i in range(0, 5):
        dist = pm.Normal.dist(mu=gss_per_obs[:, i], sd=r_E_per_obs)
        dist.mean = gss_spindle_testvals[i]

    w_array = tt.extra_ops.to_one_hot(w, nb_class=max_s + 1)
    s = pm.Mixture('s', w=w_array,

I’ve added some questions that seem related to my problem below for anyone who might come with a similar issue later down the track.

In general, there is no good way to inference unknown number of mixture component in pymc3 (I would say even in general, as the problem is really difficult). My suggestion for you is to estimate a truncated mixture model, where you always have more mixture component than you need. And if the true latent component distributions contains only 2 components, the rest of the component would just end up with very small weight.
You can have a look at https://docs.pymc.io/notebooks/dp_mix.html

Great thanks!

Sorry, I don’t have the probability language to express my model, but i think i have a “truncated mixture model”, where the number of components is bounded at 6 (5 normals and one uniform).

I read through the link you posted, and to be honest, i don’t understand how to extend it to my problem.
I would like to have a sparse w for each observation, where, for example, if it is likely there are only 2 components need to explain the data, then w would likely be = [1 0 0 0 0 0] or [0 1 0 0 0 0].
However, in that example, there is only a single w. Also, is there a way to force w to be sparse?

I’ve drawn a little picture to give some more context on the data: