Truncated Distributions in Mixture Models?

kpei · September 19, 2017, 4:05pm

I am trying to model a mixture distribution that has a pretty strict cut-off at -100. Here’s a brief overview of the data (intuitively, there are 3 centers/clusters).

download (17)

I first add 100 to all my dps so that cutoff is at 0. My main options that I have tried are listed below with results:

NormalMixture as a base model: This means that the model allows for < -100 points to exist, that’s fine for now… Using the same code from Austin Rochford’s model, I get extremely slow sampling speed for NUTS (~1.5 it/s). Modifying the tau component so that it is now sd, it is much faster. However, the mu components have nearly 0 sd.

with pm.Model() as model:
    w = pm.Dirichlet('w', np.array([1.]*3))

    mu = pm.Normal('mu', [0., 100., 200.], 10., shape=3)
    sd = pm.Exponential('sd', 0.05, shape=3)

    x_obs = pm.NormalMixture('x_obs', w, mu, sd, observed=obs)

See plots below

Blue = sample_ppc, green = data

Mixture class: Two Normal distribution one bounded distribution (e.g HalfCauchy, Exponential, etc). Chains doesn’t converge, no image but it runs around like a random walk with beta failing to sample


with pm.Model() as model:
    w = pm.Dirichlet('w', np.array([1.]*3))

    mu_duel = pm.Normal('mu_duel', 100., 30.)
    sigma_duel = pm.Exponential('sd_duel', 0.01)
    mu_kill = pm.Normal('mu_kill', 200., 30.)
    sigma_kill = pm.Exponential('sd_kill', 0.01)
    beta = pm.Exponential('beta', 2)
    
    duel = pm.Normal.dist(mu_duel, sigma_duel)
    kill = pm.Normal.dist(mu_kill, sigma_kill)
    death = pm.HalfCauchy.dist(beta)
    
    order_means_potential = pm.Potential('order_means_potential',
                                         tt.switch(mu_kill-mu_duel < 0, -np.inf, 0))

    x_obs = pm.Mixture('x_obs', w, [death, duel, kill], observed=obs)

Attached is my datasetobs.npy (16.5 KB)

Anything i can do? should I try a non-marginalized version of the gaussian mixture?

junpenglao · September 19, 2017, 8:10pm

Hmm, a quick observation: I don’t think your data will be described well by a Gaussian Mixture. As shown in the histogram below (blue is the raw data) with more bins, there are 3 strong peaks, which the Gaussian Mixture model more or less capture (green histogram).

kpei · September 19, 2017, 10:34pm

Thanks for your help again. Why do you think a gaussian mixture wouldnt be appropriate? I started that as a baseline to see the fit but its strange that mu isnt even sampling but constant over all samples… Thats strange to see especially since id expect there to be more variance around each peak

junpenglao · September 20, 2017, 11:15am

The mu does change - you can print the trace to see. It just that they are in a different scale so when you plot the trace it looks like a flat line.

There is a strong peak around 0, 100, and 200 - I have a feeling it would strongly bias the fit. You can try to compare the result with a frequentistic fit using scikit-learn.

Topic		Replies	Views
Mixture model with truncated distributions v5	3	255	April 2, 2024
Add bounds to gaussian unmixture model v5	4	262	December 12, 2023
Truncate NormalMixture between 1 and 12 Questions	1	396	January 18, 2022
Using Bound RV in a Mixture v5 bug , modeling	5	469	April 7, 2023
Mixture Model Metropolis vs. NUTS Questions	1	588	April 1, 2020

Truncated Distributions in Mixture Models?

Related topics