Modeling a Spectrum of Gaussians with SMC

Here is yet another spectrum of Gaussians question

I am attempting to fit a spectrum of Gaussians. Eventually, I aim to determine the number of Gaussian components using model selection. Eventually, the order of the Gaussian components might be important due to the nature of the model. For now, however, I’m just concerned with the simplest case of fitting a spectrum of Gaussians.

Consider the following generative model and simulated data. I am parameterizing the model in terms of the Gaussian area (called line_area) and full-width at half-maximum line width (called fwhm).

import numpy as np
import pymc as pm
print("pymc version:", pm.__version__)

from pymc.distributions.transforms import Ordered
import pytensor.tensor as pt
import arviz as az
import matplotlib.pyplot as plt

rng = np.random.RandomState(seed=1234)

def calc_amplitude(line_area, fwhm):
    return line_area / fwhm / np.sqrt(np.pi / (4.0 * np.log(2.0)))

def gaussian(x, amp, fwhm, center):
    return amp * pt.exp(-4.0 * np.log(2.0) * (x - center)**2.0 / fwhm**2.0)

x = np.linspace(-100.0, 100.0, 1000)
n_components = 2
line_area_true = np.array([250.0, 150.0])
fwhm_true = np.array([25.0, 35.0])
amplitude_true = calc_amplitude(line_area_true, fwhm_true)
center_true = np.array([-15.0, 25.0])
noise_true = 1.0
observed =  noise_true * rng.randn(len(x)) + gaussian(
    x[:, None],
    amplitude_true,
    fwhm_true,
    center_true
).eval().sum(axis=1)

fig, ax = plt.subplots()
ax.plot(x, observed, 'k-')
fig.savefig("data.png")
plt.close(fig)

data

Here is my pymc model definition. Note that I put Gamma distribution priors on line_area and fwhm in order to prevent zero-signal models (as line_area goes to zero) and exploding amplitude models (as fwhm goes to zero). Given the labelling degeneracy, the posterior distribution is multi-modal. Assuming that each chain converges to one of the modes, then I can handle the labelling degeneracy in post-processing (e.g., by clustering the posterior samples).

coords = {"component": range(n_components)}
with pm.Model(coords=coords) as model:
    line_area_norm = pm.Gamma(
        "line_area_norm", alpha=2.0, beta=1.0, dims="component"
    )
    line_area = 100.0 * line_area_norm

    fwhm_norm = pm.Gamma(
        "fwhm_norm", alpha=2.0, beta=1.0, dims="component"
    )
    fwhm = 30.0 * fwhm_norm

    center_norm = pm.Normal(
        "center_norm", mu=0.0, sigma=1.0, dims="component"
    )
    """
    center_norm = pm.Normal(
        "center_norm",
        mu=0.0,
        sigma=1.0,
        dims="component",
        transform=Ordered(),
        initval=np.linspace(-1.0, 1.0, n_components),
    )
    """
    center = 20.0 * center_norm

    noise = pm.HalfNormal("noise", sigma=1.0)

    amplitude = calc_amplitude(line_area, fwhm)
    predicted = gaussian(x[:, None], amplitude, fwhm, center).sum(axis=1)
    _ = pm.Normal(
        "predicted", mu=predicted, sigma=noise, observed=observed
    )

Samples from the prior predictive look reasonable.
prior_predictive

Despite the labeling degeneracy and multi-modal posterior distribution, the model samples fine with NUTS.

with model:
    trace = pm.sample()

And the posterior predictive samples look great.
posterior_predictive

For some data (e.g., widely separated, narrow Gaussians), however, I have found that NUTS sometimes gets “stuck” in a local minimum of the posterior distribution (i.e., putting both components on the same Gaussian, each with half the amplitude and the same width). Thus, I turned to Sequential Monte Carlo which, based on my understanding, is better equipped for multi-modal posterior distributions and should have no problem with this model.

So, despite NUTS’ success with this simple model, let’s give SMC a try.

with model:
    trace_smc = pm.sample_smc()

SMC seems to be getting stuck on this simple model. Why?

I noticed that there seems to be some label switching going on. This doesn’t necessarily matter to me, as I can cluster the posterior samples and assign labels manually a priori, but that does not explain why some of these chains are getting “stuck” in narrow regions of the parameter space.

Nonetheless, I tried enforcing an Ordered transformation on the center parameter (see the commented-out code in the model), but then sample_smc raises a ValueError: cannot convert float NaN to integer exception. Does sample_smc not support Ordered transformations?

For ordered you may need to provide a custom valid initial point, this is sometimes also a problem in NUTS

The ValueError occurs even with initval (see commented-out code in model).

Digging into the code for SMC a bit, the function that constructs the initial population doesn’t appear to respect custom initial values, so you’re going to get an error when you try to use ordered (unless you luck out and get ordered random draws from the unconstrained prior for every single initial draw). In particular, this line:

value = transform.forward(value, *variable.owner.inputs)

Assumes that the values are already in the constrained space (as forward is the transformation to the unconstrained space). For ordered, that means they are 1) ordered, and 2) strictly positive – see here, noting that log function. This isn’t going to be the case for draws from a Normal centered on zero, unless you get supremely lucky.

A better solution for generating ordered draws would be to apply a function to the samples themselves that will enforce the ordering. See here for a discussion.

Thanks for the help, @ricardoV94 and @jessegrabowski !

I investigated SMC’s failure to respect initval, and I submitted a bug report and PR here:

With this fix to SMC, the sampler handles this model just fine. So indeed it was the label switching that was confusing SMC. Thanks again!

1 Like

Often you can do prediction for quantities of interest without having to assign clusters. For example, you can answer questions like the probability that two data items are in the same cluster, or you can do posterior predictive data generation or prediction.

The only multimodality here is whether the two modes collapse and the label switching. You should be able to prevent the former by initializing them apart. The same model collapse problem can arise in SMC.

That’s one way to go, but it causes implicit label switching and thus can be hard to adapt and sample. An alternative that would be kosher according to the “generative” intentions of PyMC would be to do something like z ~ lognormal(mu, sigma) (or some other positive-constrained prior) and then set y = cumulative_sum(z). In Stan, we just transform with cumulative_sum(exp(x)) where x is unconstrained and apply the Jacobian correction and then put a prior directly on y.

1 Like

May I ask if you have a way to reason about priors on transformed quantities like these, when the constrained variable cannot often respect the prior due to truncation (it totally makes sense to me in the cases where it can, like a log transform for positive priors)

In most cases, they’re quantities that people are familiar with such as positive-constrained values, probabilities, or covariance matrices, where standard distributions can be used or users can make their own up. We have a whole Wiki full of prior advice, as well as a lot of advice in our User’s Guide.

After the Jacobian correction, Stan assigns all parameters a (potentially improper) uniform prior distribution. Any prior on top of this is just added in on the log scale. This is how all of Stan works—the declared variables get uniform support and if you want a different prior, you add its log density to the target density.

For ordered vectors like this, we’re often using them for cutpoints in an ordinal regression. In that case, we usually don’t want them to collapse to the same values, so we’d use something like a zero-avoiding positive-constrained prior on the differences. Something like

vector<lower=0>[N] y;
...
y[2:] - y[1:N-1] ~ lognormal(0, 1);

That’s still only putting a prior on N - 1 degrees of freedom, so it’s still not proper. For that, you’d want to add a distribution on one of the y values, such as

y[1] ~ ...;
1 Like

Actually, I spoke too soon! Notice that the trace plots for my “solution” converge to center_norm = [-1, 1]. They’re stuck at the initval… I think enforcing initval in this way mucks up with SMC’s algorithm (tempering from the prior to the posterior).

Thank you for the useful insight @bob-carpenter !

True. In my case, I’m just interested in the summary statistics across all chains, or even r-hat. If the labels are mis-matched between chains, then these statistics are meaningless.

Yes, this usually works. Usually I really want a solution that always works.

The truth about my models it that, depending on the parameters, the order of the components could be irrelevant (labeling degeneracy), the order of the components could be determined uniquely (no labeling degeneracy), or a mix of the two scenarios depending on the component parameters.

Is there some other way to parameterize a model like this? Like, could the order of the components somehow be another RV to be inferred? Would that make it easier or harder to sample?

Thank you for this suggestion, @bob-carpenter. I have experimented with such a model and have found good results!

coords = {"component": range(n_components)}
with pm.Model(coords=coords) as model:
    line_area_norm = pm.Gamma(
        "line_area_norm", alpha=2.0, beta=1.0, dims="component"
    )
    line_area = 100.0 * line_area_norm

    fwhm_norm = pm.Gamma(
        "fwhm_norm", alpha=2.0, beta=1.0, dims="component"
    )
    fwhm = 30.0 * fwhm_norm

    center_lower = -20.0
    center_offset_norm = pm.HalfNormal(
        "center_offset_norm",
        sigma=1.0,
        dims="component",
    )
    center_offset = 20.0 * center_offset_norm
    center = pm.Deterministic(
        "center",
        center_lower + pm.math.cumsum(center_offset, axis=0),
        dims="component"
    )
    
    noise = pm.HalfNormal("noise", sigma=1.0)

    amplitude = calc_amplitude(line_area, fwhm)
    predicted = gaussian(x[:, None], amplitude, fwhm, center).sum(axis=1)
    _ = pm.Normal(
        "predicted", mu=predicted, sigma=noise, observed=observed
    )

The model samples fine with both NUTS and SMC. Some NUTS chains collapse to a single model, but SMC seems to find both modes just fine.

That’s true if you use identity of cluster in the summary statistics. But most of what one cares about from clustering is not about summary statistics. You can calculate things like the data log density (ELPD) just fine for leave-one-out cross-validation, for example. The probability that two elements are in the same cluster can also be calculated with posterior predictive inference, as can the density of a new point. You can do posterior predictive checks no problem. None of this runs into label switching because it marginalizes over the label identity, so it should all lead to reasonable ESS and R-hat values.