Fitting a histogram

BjornHartmann · November 23, 2020, 4:00pm

I am trying to build a model that seemingly consist of two normal distributions. My data however is binned. Therefore i believe a qaussian mixture model cannot be used(?). Is there a way that i can do a fitting of two normal distributions to binned data like this in pymc3?

ckrapu · November 23, 2020, 6:46pm

There is definitely a less hacky way to do this, but what about just sampling N_b points uniformly from the interval within each bin b and then using the GMM directly?

BjornHartmann · November 24, 2020, 7:22am

hi thanks for the reply! I feel this is on the hackier side of things…

junpenglao · November 24, 2020, 7:27am

You can also try a curve fitting with the curve being the pdf of a GMM.

junpenglao · November 24, 2020, 7:34am

You might find this discussion helpful: Fitting a spectra of gaussians

BjornHartmann · November 26, 2020, 2:15pm

Hi again, Thank you for the suggestions!
I found the linked discussion very useful, and your notebook example using a mixture_density function was exatly what i was looking for.

I am hoping you can help me clearify one thing, @junpenglao:
I was testing for robustness of different magnitudes of the bin heights, so i gave a histogram with density=true as my data, essentially scaling down the bin heights. The model is not able to approximate this, nor is it able to find a fit when i scale the bin height by lets say 100. I have tried changing my w-prior to compensate for this, but I am getting very weird results.

Is this something that you can explain?
In advance, thank you so much for answering!

junpenglao · November 26, 2020, 2:59pm

you probably should adjust the prior - but if you have access to the binning it would be easier to fit a GMM directly, unless I am missing something here?

BjornHartmann · November 27, 2020, 11:16am

Hi, thanks again for getting back to me.
By access to the binning i guess you mean the original data that is beiing binned, but in a real life case i do not av access to this data, Only the bin results.

I have tried changing your mixture density example slightly and am now running the following code, where the observed data is bin heights:

def mixture_density(w, mu, sd, x):
    logp =  pm.NormalMixture.dist(w,mu, sd).logp(x)
    return tt.exp(logp)

 
with m:
    w = pm.Dirichlet('w', np.ones_like(centers)*.5)
    mu = pm.Normal('mu', 0., 5., shape=centers.size)
    tau = pm.HalfCauchy('tau', 1., shape=centers.size)
    y= mixture_density(w, mu, tau, x)
    y_obs=pm.Normal('y_obs',mu=y,observed=df['y'][0])

I am sorry for not understanding what is going wrong, but do you see something that may cause this to fail? It is giving very weird results especally for the density=True cases for the binning operation.

junpenglao · November 27, 2020, 12:43pm

One thing you can try is to add a scaling factor to the mixture_density, as the histogram (even with density=True) might not gives a prior pdf that could fit with the mixture distribution you specified.

def mixture_density(w, mu, sigma, scaling, x):
    logp =  pm.NormalMixture.dist(w, mu, sigma).logp(x)
    return tt.exp(logp) * scaling

 
with m:
    w = pm.Dirichlet('w', np.ones_like(centers)*.5)
    mu = pm.Normal('mu', 0., 5., shape=centers.size)
    sigma = pm.HalfCauchy('sigma', 1., shape=centers.size)
    scaling = pm.TruncatedNormal('scaling', 1., 0.1, lower=0)
    y= mixture_density(w, mu, sigma, scaling, x)
    y_obs=pm.Normal('y_obs',mu=y,observed=df['y'][0])

Topic		Replies	Views
Fitting multimodal data Questions	0	368	November 3, 2021
Gaussian Mixture Model fitting Questions	5	1204	October 1, 2021
Fitting joint distributions using custom distribution as part of mixture Questions	3	480	January 13, 2020
Help with fitting mixture -- data is fit much worse than fitting the distributions separately v5 modeling	0	20	July 24, 2024
2D Gaussian Mixture Model Questions	7	2074	August 25, 2017

Fitting a histogram

Related topics