Fitting a spectra of gaussians

Ultimately I would like to do model selection to find the number of atomic emission spectral lines in a measured spectrum. I would first like to be able to specify the number of gaussians to fit to a spectrum. I see there is a normal mixture and maybe thats what I should be using but I can’t make sense of it. I have been able to fit data with two peaks with the but with the code below I would have to make a function every time I want to specify a different number of gaussians. What I want is a way to have a way to specify an arbitrary number of gaussians.

def gaussian(x,amp,cent,sig):
    return amp*np.exp( -1*(x-cent)**2/2/sig**2)

amp = 1
cent = 0
sig = 0.1
x = np.linspace(-2,2,100)
f_true = gaussian(x,amp,cent,sig) + gaussian(x,1,-1,sig)
noise = np.random.normal(size=len(x))*.002
f = f_true + noise

with Model() as gaussian_model:

    #Parameters with priors
    amp = Uniform('amplitude',lower=0,upper=2)
    cent= Uniform('centroid',lower=-2,upper=2)
    sig = Uniform('sigma',lower=0, upper=4)
    amp2 = Uniform('amplitude2',lower=0,upper=2)
    cent2= Uniform('centroid2',lower=-2,upper=2)
    sig2 = Uniform('sigma2',lower=0, upper=4)

    sigma = Uniform('sigmat',lower=0,upper=1)

    gauss = Deterministic('gauss',amp*np.exp( -1*(cent -x)**2/2/sig**2)+amp2*np.exp( -1*(cent2 -x)**2/2/sig2**2))
    likeli = Normal('y', mu = gauss, sd=sigma,observed = f)

    start=find_MAP()
    step=NUTS()
    trace=sample(1000,step,start=start)

I tried to do something like the snippet below but this caused a huge slow down as well as a terrible fit.

def multi_gaussian(x,amp,cent,sig):
    function=0
    for i in range(0,len(amp)):
        function = function + amp[i]*np.exp(-1*(x-cent[i])**2/2/sig[i]**2)

You shouldn’t be using NormalMixture for this kind of data. In short, if you plot the histogram of the data and it looks like a mixture of gaussian, then you should use NormalMixture. Otherwise, what you are trying to do is estimating some parameter of a nonlinear function.

What you are doing is on the right track, trying to implement a better mixture gaussian function should help:

from pymc3.math import logsumexp
import theano.tensor as tt
def mixture_density(w, mu, sd, x):
    logp = tt.log(w) + pm.Normal.dist(mu, sd).logp(x)
    return tt.sum(tt.exp(logp))

w, mu, sd should be theano vector. So you will need to do for example: w = tt.stack([amp, amp2])

Thanks, I think I should be able to work with this. Did I just miss documentation on this? I searched for a few days and came up with nothing.

Not for this specific usage. We have a few examples on mixture models but that’s more to model the distribution (historgram) of data.

I am confused what is going on with the theano mixture model. How does it know the grid to put the function on? It looks like ‘x’ in your mixture model is not the same x array that I have in the code above?

Are there any examples of fitting non linear functions? I cannot find any after a pretty extensive search.

It is determined by the input x, as the function evaluates on x_1, x_2,…, x_n

You are right there is no good collection of examples dedicated to fitting non-linear functions. I put up a small Gist you might find helpful: Nonlinear_funcs.ipynb · GitHub

Interestingly, rerunning the last cell in the gist with sample(random_seed=1), it appears that sometimes the two chains disagree on which Gaussian is which:

>>> values = trace.get_values('mu', combine=False)
>>> values[0][-1], values[1][-1]
(array([-1.05009326,  2.59436901]), array([ 2.59135272, -1.01800517]))

This results in r_hat values on each of the n_shape=2 parameters (mu, w, sd) of 1.83 and warnings. Is there anything that can be done to avoid that?