Mixture with multiple observations

Hello, I’m trying to figure out the right way to generate model for this data in pymc3:

2017-12-08 15_21_22-Clipboard

So I have about 70 observations, and only 1 of them is informative. I have a mix of 2 classes which have different informative observations.

I’m able to capture the important observation when it’s only 1 (not a mixture) with the code below. Can anybody give me a hint how to consider it for a mixture? I don’t need a detailed solution, even a hint would be so helpful!

with pm.Model() as model:
    #pm.GLM.from_formula('y ~ x', y)
    mu_a = pm.Normal('mu_a', mu=0., sd=100.**2)
    sigma_a = pm.HalfCauchy('sigma_a', 5.)
    a = pm.Normal('a', mu=mu_a, sd=sigma_a, shape=C)
    eps = pm.HalfCauchy('eps',5)
    y_est = pm.Deterministic('y_est',a)
    data_likelihood = pm.Normal('data_likelihood', mu=y_est, sd=eps, observed=y)

    mode = pm.find_MAP()
    trace = pm.sample(draws=1000, tune=200)


example_data.csv (365.7 KB)

This reminds me of another post here

You can have a look at the notebook https://gist.github.com/junpenglao/1907bf019906c125f63126ec4bf23880#file-mixture_discourse-ipynb

I think extending the model above into a softmax should solve the problem.

Thanks @junpenglao! This is a huge help and I’m able to run my example based off the non-marginalized mixture. I’m still working on figuring out how the marginalized mixture works because the performance involving categorical variables is pretty brutal.

1 Like

Hi @junpenglao. I’m trying to extend your notebook to the multivariate case that I have but I can’t seem to get the dimensions correct. Can I still use the pm.NormalMixture function when I have multiple observations like in the picture above? I’m not sure where to apply softmax because my output data is a 2-D array and not a vector of classes.

handling shape in pm.NormalMixture might be tricky, maybe try reshaping the output data into a vector (also the mu and sd for the mixture)

I have something close that’s giving my best result yet. It’s using a DensityDist like the example from Gaussian Mixture Model with ADVI in the PyMC3 docs. I’d like to try your idea of applying a softmax constraint, but I’m not sure where to do that. Is that an additional factor in the “logps” variable? I have a github notebook modified to use my data if you’d like to take a peek :).

Hi @chris,
Actually, I don’t think my suggestion of using a softmax is well thought. Looking at the data generation process, each row is not necessary sum to 1 right? If you have K > 2 you can try modeling the pi using a softmax, but I am not sure it is better than using Dirichlet (which use a stick-breaking process internally).

1 Like