What is the advantage of using pymc's Mixture distribution in a latent mixture model?

James_Vanneman · March 30, 2020, 2:51pm

I’m going through the bayesian cognitive modeling book and one of the exercises gives the following latent mixture model with regards to students getting N questions right on an exam with 40 questions. Students getting in the 20 range are assumed to be guessing and the goal is to determine who is likely guessing and who studied.

scores = [ 21, 17, 21, 18, 22, 31, 31, 34, 34, 35, 35, 36, 39, 35] 
with pm.Model() as model:
    zi = pm.Bernoulli('zi', p=0.5, shape=len(scores))
    phi = pm.Uniform("phi", 0.5, 1, shape=len(scores))
    psi = 0.5
    theta = pm.Deterministic('theta',  pm.math.eq(zi, 1)*phi+pm.math.eq(zi, 0)*psi)
    
    pm.Binomial('obs', p=theta, n=40, observed=scores)
    traces = pm.sample(2000, tune=10000, cores=4)

I wanted to see what the difference between that model and one using pymc3’s mixture class so I came up with a different model:

with pm.Model() as mixture_model:
    w = pm.Dirichlet('w', a=np.ones(2))
    sp = pm.Uniform('sp', 0.5, 1, shape=len(scores))
    dist1 = pm.Binomial.dist(p=sp, n=40, shape=len(scores))
    dist2 = pm.Binomial.dist(p=[0.5]*len(scores), n=40, shape=len(scores))
    mixt = pm.Mixture('mixt', w=w, comp_dists=[dist1, dist2], observed=scores)
    traces = pm.sample(3000, tune=1000, cores=4)

Are these models essentially the same thing? The second model seems a bit limiting in that I can’t estimate the theta value for an individual user.
In the first model, I can answer questions like “What percent of the posterior distribution of theta for a given user is > 0.5”, which gives me a clue about which group the user belongs to. Is there a way to directly ask, what is the probability that a user belongs to group 1 vs group 2?
Does w in the second model represent what % of users belong to each group?
In the second model, how do I answer "What is the probability user1 belongs to group 1 or group 2?

fonnesbeck · April 6, 2020, 1:53am

The advantage of using the mixture model is that PyMC will be able to use NUTS for sampling. In your first model, you have discrete stochastic variables (the z’s), so gradient-based sampling will not work.

fonnesbeck · April 6, 2020, 1:55am

Regarding your other questions, w does specify the proportion that belong to each group, so the answer to 3 is “yes”, and the answer to 4 is “By inferring from w”.

James_Vanneman · April 6, 2020, 3:02pm

Thanks Chris! That makes sense about using NUTS. It’s still not clear to me how I would infer a specific users probability of belong to each group from w. w seems like a group level parameter(what % of all users belong to group 1 or group 2), but certain users have a higher percentage chance of belonging to one group or the other that differs from the group average. How can you capture this from w?

jbuddy_13 · November 16, 2020, 3:55pm

Hi Chris @fonnesbeck, I’m late to the party on this post but intrigued nonetheless!

you have discrete stochastic variables (the z’s), so gradient-based sampling will not work

A few short questions:

I’m a little confused when NUTS is/isn’t an option. Is it the discreteness of the zi that precludes gradient based sampling, or the fact that it’s a multilevel model?
Does PyMC3 revert to a basic Metropolis sampler (or Gibbs sampler) when gradient info isn’t computable?
With the switch from Theano to Jax, will this be a nonissue in months to come?

Cheers!

James_Vanneman · November 4, 2023, 2:50pm

@jbuddy_13 discrete variables prevents gradient based sampling because you can’t calculate the derivative of changes to a discrete variables

ricardoV94 · November 4, 2023, 5:06pm

Late to the party but it’s possible to retrieve the discrete variables implicitly marginalized by pm.Mixture after sampling is done. We show an example in the last section of this blogpost: Out of model predictions with PyMC - PyMC Labs

ricardoV94 · November 4, 2023, 5:09pm

Yes. Discrete variables don’t have well defined gradients (you can’t take an infinitesimal change)
Yes.
No the same applies. There’s no switch to JAX though, it’s just a different computational backend that can be used to evaluate expressions.

Topic		Replies	Views
Mixture Model Metropolis vs. NUTS Questions	1	592	April 1, 2020
Help with mixture model of MvNormals in pymc3? Questions	4	704	October 24, 2019
Marginalizing out a categorical variable Questions	18	2202	April 1, 2021
Marginal likelihood for distributions with discrete variables v3	12	1450	November 14, 2022
Sampling does not start or very very slow while attempting a Mixture Model tutorial Questions	4	2223	September 19, 2018

What is the advantage of using pymc's Mixture distribution in a latent mixture model?

Related topics