I am trying to build a model that seemingly consist of two normal distributions. My data however is binned. Therefore i believe a qaussian mixture model cannot be used(?). Is there a way that i can do a fitting of two normal distributions to binned data like this in pymc3?
There is definitely a less hacky way to do this, but what about just sampling N_b points uniformly from the interval within each bin b and then using the GMM directly?
hi thanks for the reply! I feel this is on the hackier side of things…
You can also try a curve fitting with the curve being the pdf of a GMM.
You might find this discussion helpful: Fitting a spectra of gaussians
Hi again, Thank you for the suggestions!
I found the linked discussion very useful, and your notebook example using a mixture_density function was exatly what i was looking for.
I am hoping you can help me clearify one thing, @junpenglao:
I was testing for robustness of different magnitudes of the bin heights, so i gave a histogram with density=true as my data, essentially scaling down the bin heights. The model is not able to approximate this, nor is it able to find a fit when i scale the bin height by lets say 100. I have tried changing my w-prior to compensate for this, but I am getting very weird results.
Is this something that you can explain?
In advance, thank you so much for answering!
you probably should adjust the prior - but if you have access to the binning it would be easier to fit a GMM directly, unless I am missing something here?
Hi, thanks again for getting back to me.
By access to the binning i guess you mean the original data that is beiing binned, but in a real life case i do not av access to this data, Only the bin results.
I have tried changing your mixture density example slightly and am now running the following code, where the observed data is bin heights:
def mixture_density(w, mu, sd, x):
logp = pm.NormalMixture.dist(w,mu, sd).logp(x)
return tt.exp(logp)
with m:
w = pm.Dirichlet('w', np.ones_like(centers)*.5)
mu = pm.Normal('mu', 0., 5., shape=centers.size)
tau = pm.HalfCauchy('tau', 1., shape=centers.size)
y= mixture_density(w, mu, tau, x)
y_obs=pm.Normal('y_obs',mu=y,observed=df['y'][0])
I am sorry for not understanding what is going wrong, but do you see something that may cause this to fail? It is giving very weird results especally for the density=True cases for the binning operation.
One thing you can try is to add a scaling factor to the mixture_density
, as the histogram (even with density=True) might not gives a prior pdf that could fit with the mixture distribution you specified.
def mixture_density(w, mu, sigma, scaling, x):
logp = pm.NormalMixture.dist(w, mu, sigma).logp(x)
return tt.exp(logp) * scaling
with m:
w = pm.Dirichlet('w', np.ones_like(centers)*.5)
mu = pm.Normal('mu', 0., 5., shape=centers.size)
sigma = pm.HalfCauchy('sigma', 1., shape=centers.size)
scaling = pm.TruncatedNormal('scaling', 1., 0.1, lower=0)
y= mixture_density(w, mu, sigma, scaling, x)
y_obs=pm.Normal('y_obs',mu=y,observed=df['y'][0])