What is the advantage of using pymc's Mixture distribution in a latent mixture model?

I’m going through the bayesian cognitive modeling book and one of the exercises gives the following latent mixture model with regards to students getting N questions right on an exam with 40 questions. Students getting in the 20 range are assumed to be guessing and the goal is to determine who is likely guessing and who studied.

scores = [ 21, 17, 21, 18, 22, 31, 31, 34, 34, 35, 35, 36, 39, 35] 
with pm.Model() as model:
    zi = pm.Bernoulli('zi', p=0.5, shape=len(scores))
    phi = pm.Uniform("phi", 0.5, 1, shape=len(scores))
    psi = 0.5
    theta = pm.Deterministic('theta',  pm.math.eq(zi, 1)*phi+pm.math.eq(zi, 0)*psi)
    
    pm.Binomial('obs', p=theta, n=40, observed=scores)
    traces = pm.sample(2000, tune=10000, cores=4)

I wanted to see what the difference between that model and one using pymc3’s mixture class so I came up with a different model:

with pm.Model() as mixture_model:
    w = pm.Dirichlet('w', a=np.ones(2))
    sp = pm.Uniform('sp', 0.5, 1, shape=len(scores))
    dist1 = pm.Binomial.dist(p=sp, n=40, shape=len(scores))
    dist2 = pm.Binomial.dist(p=[0.5]*len(scores), n=40, shape=len(scores))
    mixt = pm.Mixture('mixt', w=w, comp_dists=[dist1, dist2], observed=scores)
    traces = pm.sample(3000, tune=1000, cores=4)
  1. Are these models essentially the same thing? The second model seems a bit limiting in that I can’t estimate the theta value for an individual user.

  2. In the first model, I can answer questions like “What percent of the posterior distribution of theta for a given user is > 0.5”, which gives me a clue about which group the user belongs to. Is there a way to directly ask, what is the probability that a user belongs to group 1 vs group 2?

  3. Does w in the second model represent what % of users belong to each group?

  4. In the second model, how do I answer "What is the probability user1 belongs to group 1 or group 2?

2 Likes

The advantage of using the mixture model is that PyMC will be able to use NUTS for sampling. In your first model, you have discrete stochastic variables (the z’s), so gradient-based sampling will not work.

3 Likes

Regarding your other questions, w does specify the proportion that belong to each group, so the answer to 3 is “yes”, and the answer to 4 is “By inferring from w”.

3 Likes

Thanks Chris! That makes sense about using NUTS. It’s still not clear to me how I would infer a specific users probability of belong to each group from w. w seems like a group level parameter(what % of all users belong to group 1 or group 2), but certain users have a higher percentage chance of belonging to one group or the other that differs from the group average. How can you capture this from w?

1 Like

Hi Chris @fonnesbeck, I’m late to the party on this post but intrigued nonetheless!

you have discrete stochastic variables (the z’s), so gradient-based sampling will not work

A few short questions:

  • I’m a little confused when NUTS is/isn’t an option. Is it the discreteness of the zi that precludes gradient based sampling, or the fact that it’s a multilevel model?
  • Does PyMC3 revert to a basic Metropolis sampler (or Gibbs sampler) when gradient info isn’t computable?
  • With the switch from Theano to Jax, will this be a nonissue in months to come?

Cheers!

@jbuddy_13 discrete variables prevents gradient based sampling because you can’t calculate the derivative of changes to a discrete variables

Late to the party but it’s possible to retrieve the discrete variables implicitly marginalized by pm.Mixture after sampling is done. We show an example in the last section of this blogpost: Out of model predictions with PyMC - PyMC Labs

  1. Yes. Discrete variables don’t have well defined gradients (you can’t take an infinitesimal change)
  2. Yes.
  3. No the same applies. There’s no switch to JAX though, it’s just a different computational backend that can be used to evaluate expressions.