Here is yet another spectrum of Gaussians question…

I am attempting to fit a spectrum of Gaussians. Eventually, I aim to determine the number of Gaussian components using model selection. Eventually, the order of the Gaussian components might be important due to the nature of the model. For now, however, I’m just concerned with the simplest case of fitting a spectrum of Gaussians.

Consider the following generative model and simulated data. I am parameterizing the model in terms of the Gaussian area (called `line_area`

) and full-width at half-maximum line width (called `fwhm`

).

```
import numpy as np
import pymc as pm
print("pymc version:", pm.__version__)
from pymc.distributions.transforms import Ordered
import pytensor.tensor as pt
import arviz as az
import matplotlib.pyplot as plt
rng = np.random.RandomState(seed=1234)
def calc_amplitude(line_area, fwhm):
return line_area / fwhm / np.sqrt(np.pi / (4.0 * np.log(2.0)))
def gaussian(x, amp, fwhm, center):
return amp * pt.exp(-4.0 * np.log(2.0) * (x - center)**2.0 / fwhm**2.0)
x = np.linspace(-100.0, 100.0, 1000)
n_components = 2
line_area_true = np.array([250.0, 150.0])
fwhm_true = np.array([25.0, 35.0])
amplitude_true = calc_amplitude(line_area_true, fwhm_true)
center_true = np.array([-15.0, 25.0])
noise_true = 1.0
observed = noise_true * rng.randn(len(x)) + gaussian(
x[:, None],
amplitude_true,
fwhm_true,
center_true
).eval().sum(axis=1)
fig, ax = plt.subplots()
ax.plot(x, observed, 'k-')
fig.savefig("data.png")
plt.close(fig)
```

Here is my `pymc`

model definition. Note that I put `Gamma`

distribution priors on `line_area`

and `fwhm`

in order to prevent zero-signal models (as `line_area`

goes to zero) and exploding amplitude models (as `fwhm`

goes to zero). Given the labelling degeneracy, the posterior distribution is multi-modal. Assuming that each chain converges to one of the modes, then I can handle the labelling degeneracy in post-processing (e.g., by clustering the posterior samples).

```
coords = {"component": range(n_components)}
with pm.Model(coords=coords) as model:
line_area_norm = pm.Gamma(
"line_area_norm", alpha=2.0, beta=1.0, dims="component"
)
line_area = 100.0 * line_area_norm
fwhm_norm = pm.Gamma(
"fwhm_norm", alpha=2.0, beta=1.0, dims="component"
)
fwhm = 30.0 * fwhm_norm
center_norm = pm.Normal(
"center_norm", mu=0.0, sigma=1.0, dims="component"
)
"""
center_norm = pm.Normal(
"center_norm",
mu=0.0,
sigma=1.0,
dims="component",
transform=Ordered(),
initval=np.linspace(-1.0, 1.0, n_components),
)
"""
center = 20.0 * center_norm
noise = pm.HalfNormal("noise", sigma=1.0)
amplitude = calc_amplitude(line_area, fwhm)
predicted = gaussian(x[:, None], amplitude, fwhm, center).sum(axis=1)
_ = pm.Normal(
"predicted", mu=predicted, sigma=noise, observed=observed
)
```

Samples from the prior predictive look reasonable.

Despite the labeling degeneracy and multi-modal posterior distribution, the model samples fine with NUTS.

```
with model:
trace = pm.sample()
```

And the posterior predictive samples look great.

For some data (e.g., widely separated, narrow Gaussians), however, I have found that NUTS sometimes gets “stuck” in a local minimum of the posterior distribution (i.e., putting both components on the same Gaussian, each with half the amplitude and the same width). Thus, I turned to Sequential Monte Carlo which, based on my understanding, is better equipped for multi-modal posterior distributions and should have no problem with this model.

So, despite NUTS’ success with this simple model, let’s give SMC a try.

```
with model:
trace_smc = pm.sample_smc()
```

**SMC seems to be getting stuck on this simple model. Why?**

I noticed that there seems to be some label switching going on. This doesn’t necessarily matter to me, as I can cluster the posterior samples and assign labels manually *a priori*, but that does not explain why some of these chains are getting “stuck” in narrow regions of the parameter space.

Nonetheless, I tried enforcing an `Ordered`

transformation on the `center`

parameter (see the commented-out code in the model), but then `sample_smc`

raises a `ValueError: cannot convert float NaN to integer`

exception. **Does sample_smc not support Ordered transformations?**