Modeling a Spectrum of Gaussians with SMC

tvwenger · July 26, 2024, 6:29pm

Here is yet another spectrum of Gaussians question…

I am attempting to fit a spectrum of Gaussians. Eventually, I aim to determine the number of Gaussian components using model selection. Eventually, the order of the Gaussian components might be important due to the nature of the model. For now, however, I’m just concerned with the simplest case of fitting a spectrum of Gaussians.

Consider the following generative model and simulated data. I am parameterizing the model in terms of the Gaussian area (called line_area) and full-width at half-maximum line width (called fwhm).

import numpy as np
import pymc as pm
print("pymc version:", pm.__version__)

from pymc.distributions.transforms import Ordered
import pytensor.tensor as pt
import arviz as az
import matplotlib.pyplot as plt

rng = np.random.RandomState(seed=1234)

def calc_amplitude(line_area, fwhm):
    return line_area / fwhm / np.sqrt(np.pi / (4.0 * np.log(2.0)))

def gaussian(x, amp, fwhm, center):
    return amp * pt.exp(-4.0 * np.log(2.0) * (x - center)**2.0 / fwhm**2.0)

x = np.linspace(-100.0, 100.0, 1000)
n_components = 2
line_area_true = np.array([250.0, 150.0])
fwhm_true = np.array([25.0, 35.0])
amplitude_true = calc_amplitude(line_area_true, fwhm_true)
center_true = np.array([-15.0, 25.0])
noise_true = 1.0
observed =  noise_true * rng.randn(len(x)) + gaussian(
    x[:, None],
    amplitude_true,
    fwhm_true,
    center_true
).eval().sum(axis=1)

fig, ax = plt.subplots()
ax.plot(x, observed, 'k-')
fig.savefig("data.png")
plt.close(fig)

data

Here is my pymc model definition. Note that I put Gamma distribution priors on line_area and fwhm in order to prevent zero-signal models (as line_area goes to zero) and exploding amplitude models (as fwhm goes to zero). Given the labelling degeneracy, the posterior distribution is multi-modal. Assuming that each chain converges to one of the modes, then I can handle the labelling degeneracy in post-processing (e.g., by clustering the posterior samples).

coords = {"component": range(n_components)}
with pm.Model(coords=coords) as model:
    line_area_norm = pm.Gamma(
        "line_area_norm", alpha=2.0, beta=1.0, dims="component"
    )
    line_area = 100.0 * line_area_norm

    fwhm_norm = pm.Gamma(
        "fwhm_norm", alpha=2.0, beta=1.0, dims="component"
    )
    fwhm = 30.0 * fwhm_norm

    center_norm = pm.Normal(
        "center_norm", mu=0.0, sigma=1.0, dims="component"
    )
    """
    center_norm = pm.Normal(
        "center_norm",
        mu=0.0,
        sigma=1.0,
        dims="component",
        transform=Ordered(),
        initval=np.linspace(-1.0, 1.0, n_components),
    )
    """
    center = 20.0 * center_norm

    noise = pm.HalfNormal("noise", sigma=1.0)

    amplitude = calc_amplitude(line_area, fwhm)
    predicted = gaussian(x[:, None], amplitude, fwhm, center).sum(axis=1)
    _ = pm.Normal(
        "predicted", mu=predicted, sigma=noise, observed=observed
    )

Samples from the prior predictive look reasonable.
prior_predictive

Despite the labeling degeneracy and multi-modal posterior distribution, the model samples fine with NUTS.

with model:
    trace = pm.sample()

And the posterior predictive samples look great.
posterior_predictive

For some data (e.g., widely separated, narrow Gaussians), however, I have found that NUTS sometimes gets “stuck” in a local minimum of the posterior distribution (i.e., putting both components on the same Gaussian, each with half the amplitude and the same width). Thus, I turned to Sequential Monte Carlo which, based on my understanding, is better equipped for multi-modal posterior distributions and should have no problem with this model.

So, despite NUTS’ success with this simple model, let’s give SMC a try.

with model:
    trace_smc = pm.sample_smc()

SMC seems to be getting stuck on this simple model. Why?

I noticed that there seems to be some label switching going on. This doesn’t necessarily matter to me, as I can cluster the posterior samples and assign labels manually a priori, but that does not explain why some of these chains are getting “stuck” in narrow regions of the parameter space.

Nonetheless, I tried enforcing an Ordered transformation on the center parameter (see the commented-out code in the model), but then sample_smc raises a ValueError: cannot convert float NaN to integer exception. Does sample_smc not support Ordered transformations?

ricardoV94 · July 27, 2024, 4:45pm

For ordered you may need to provide a custom valid initial point, this is sometimes also a problem in NUTS

tvwenger · July 27, 2024, 11:23pm

The ValueError occurs even with initval (see commented-out code in model).

jessegrabowski · July 28, 2024, 5:17am

Digging into the code for SMC a bit, the function that constructs the initial population doesn’t appear to respect custom initial values, so you’re going to get an error when you try to use ordered (unless you luck out and get ordered random draws from the unconstrained prior for every single initial draw). In particular, this line:

value = transform.forward(value, *variable.owner.inputs)

Assumes that the values are already in the constrained space (as forward is the transformation to the unconstrained space). For ordered, that means they are 1) ordered, and 2) strictly positive – see here, noting that log function. This isn’t going to be the case for draws from a Normal centered on zero, unless you get supremely lucky.

A better solution for generating ordered draws would be to apply a function to the samples themselves that will enforce the ordering. See here for a discussion.

tvwenger · July 29, 2024, 3:57pm

Thanks for the help, @ricardoV94 and @jessegrabowski !

I investigated SMC’s failure to respect initval, and I submitted a bug report and PR here:

github.com/pymc-devs/pymc

BUG: SMC does not respect `initval`

opened 03:40PM - 29 Jul 24 UTC

tvwenger

bug

### Describe the issue: Per the [discussion here](https://discourse.pymc.io/t/m…odeling-a-spectrum-of-gaussians-with-smc/14854/2), SMC does not properly handle custom `initval` for free RVs. This is apparent when sampling from distributions with the `Ordered()` transformation, where it is imperative that the `initval` be ordered else the model will fail to sample due to a `NaN` logp at the initial point. ### Reproduceable code example: ```python import pymc as pm from pymc.distributions.transforms import Ordered with pm.Model() as model: a = pm.Normal("a", mu=0.0, sigma=1.0, size=(3,), transform=Ordered(), initval=[-1.0, 0.0, 1.0]) b = pm.Normal("b", mu=a, sigma=1.0, observed=[0.0, 0.0, 0.0]) with model: trace = pm.sample_smc() ``` ### Error message: ```shell <details> --------------------------------------------------------------------------- _RemoteTraceback Traceback (most recent call last) _RemoteTraceback: """ Traceback (most recent call last): File "/home/twenger/miniconda3/envs/pymc/lib/python3.11/concurrent/futures/process.py", line 256, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/twenger/miniconda3/envs/pymc/lib/python3.11/site-packages/pymc/smc/sampling.py", line 346, in _sample_smc_int smc.update_beta_and_weights() File "/home/twenger/miniconda3/envs/pymc/lib/python3.11/site-packages/pymc/smc/kernels.py", line 273, in update_beta_and_weights ESS = int(np.exp(-logsumexp(log_weights * 2))) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: cannot convert float NaN to integer """ The above exception was the direct cause of the following exception: ValueError Traceback (most recent call last) Cell In[1], line 9 6 b = pm.Normal("b", mu=a, sigma=1.0, observed=[0.0, 0.0, 0.0]) 8 with model: ----> 9 trace = pm.sample_smc() File ~/miniconda3/envs/pymc/lib/python3.11/site-packages/pymc/smc/sampling.py:217, in sample_smc(draws, kernel, start, model, random_seed, chains, cores, compute_convergence_checks, return_inferencedata, idata_kwargs, progressbar, **kernel_kwargs) 208 params = ( 209 draws, 210 kernel, 211 start, 212 model, 213 ) 215 t1 = time.time() --> 217 results = run_chains(chains, progressbar, params, random_seed, kernel_kwargs, cores) 219 ( 220 traces, 221 sample_stats, 222 sample_settings, 223 ) = zip(*results) 225 trace = MultiTrace(traces) File ~/miniconda3/envs/pymc/lib/python3.11/site-packages/pymc/smc/sampling.py:422, in run_chains(chains, progressbar, params, random_seed, kernel_kwargs, cores) 415 # update the progress bar for this task: 416 progress.update( 417 status=f"Stage: {stage} Beta: {beta:.3f}", 418 task_id=task_id, 419 refresh=True, 420 ) --> 422 return tuple(cloudpickle.loads(r.result()) for r in done) File ~/miniconda3/envs/pymc/lib/python3.11/site-packages/pymc/smc/sampling.py:422, in <genexpr>(.0) 415 # update the progress bar for this task: 416 progress.update( 417 status=f"Stage: {stage} Beta: {beta:.3f}", 418 task_id=task_id, 419 refresh=True, 420 ) --> 422 return tuple(cloudpickle.loads(r.result()) for r in done) File ~/miniconda3/envs/pymc/lib/python3.11/concurrent/futures/_base.py:449, in Future.result(self, timeout) 447 raise CancelledError() 448 elif self._state == FINISHED: --> 449 return self.__get_result() 451 self._condition.wait(timeout) 453 if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]: File ~/miniconda3/envs/pymc/lib/python3.11/concurrent/futures/_base.py:401, in Future.__get_result(self) 399 if self._exception: 400 try: --> 401 raise self._exception 402 finally: 403 # Break a reference cycle with the exception in self._exception 404 self = None ValueError: cannot convert float NaN to integer </details> ``` ### PyMC version information: <details> pymc 5.16.2 (conda) </details> ### Context for the issue: _No response_

github.com/pymc-devs/pymc

Enforce custom `initval` in SMC

pymc-devs:main ← tvwenger:smc_ordered

opened 03:54PM - 29 Jul 24 UTC

tvwenger

+47 -6

## Description Per [the discussion here](https://discourse.pymc.io/t/modeling-a…-spectrum-of-gaussians-with-smc/14854), it is apparent that SMC does not respect custom `initval`. This is a problem when a custom `initval` is required in order to support a transformation, such as `Ordered`. ## Related Issue - [X] Closes https://github.com/pymc-devs/pymc/issues/7438 ## Checklist - [X] Checked that [the pre-commit linting/style checks pass](https://docs.pymc.io/en/latest/contributing/python_style.html) - [X] Included tests that prove the fix is effective or that the new feature works - [X] Added necessary documentation (docstrings and/or example notebooks) - [X] If you are a pro: each commit corresponds to a [relevant logical change](https://wiki.openstack.org/wiki/GitCommitMessages#Structural_split_of_changes) ## Type of change - [ ] New feature / enhancement - [X] Bug fix - [ ] Documentation - [ ] Maintenance - [ ] Other (please specify): ---- 📚 Documentation preview 📚: https://pymc--7439.org.readthedocs.build/en/7439/

With this fix to SMC, the sampler handles this model just fine. So indeed it was the label switching that was confusing SMC. Thanks again!

bob-carpenter · August 1, 2024, 4:43pm

Often you can do prediction for quantities of interest without having to assign clusters. For example, you can answer questions like the probability that two data items are in the same cluster, or you can do posterior predictive data generation or prediction.

The only multimodality here is whether the two modes collapse and the label switching. You should be able to prevent the former by initializing them apart. The same model collapse problem can arise in SMC.

That’s one way to go, but it causes implicit label switching and thus can be hard to adapt and sample. An alternative that would be kosher according to the “generative” intentions of PyMC would be to do something like z ~ lognormal(mu, sigma) (or some other positive-constrained prior) and then set y = cumulative_sum(z). In Stan, we just transform with cumulative_sum(exp(x)) where x is unconstrained and apply the Jacobian correction and then put a prior directly on y.

ricardoV94 · August 1, 2024, 5:21pm

May I ask if you have a way to reason about priors on transformed quantities like these, when the constrained variable cannot often respect the prior due to truncation (it totally makes sense to me in the cases where it can, like a log transform for positive priors)

bob-carpenter · August 1, 2024, 7:54pm

In most cases, they’re quantities that people are familiar with such as positive-constrained values, probabilities, or covariance matrices, where standard distributions can be used or users can make their own up. We have a whole Wiki full of prior advice, as well as a lot of advice in our User’s Guide.

After the Jacobian correction, Stan assigns all parameters a (potentially improper) uniform prior distribution. Any prior on top of this is just added in on the log scale. This is how all of Stan works—the declared variables get uniform support and if you want a different prior, you add its log density to the target density.

For ordered vectors like this, we’re often using them for cutpoints in an ordinal regression. In that case, we usually don’t want them to collapse to the same values, so we’d use something like a zero-avoiding positive-constrained prior on the differences. Something like

vector<lower=0>[N] y;
...
y[2:] - y[1:N-1] ~ lognormal(0, 1);

That’s still only putting a prior on N - 1 degrees of freedom, so it’s still not proper. For that, you’d want to add a distribution on one of the y values, such as

y[1] ~ ...;

tvwenger · August 2, 2024, 8:29pm

Actually, I spoke too soon! Notice that the trace plots for my “solution” converge to center_norm = [-1, 1]. They’re stuck at the initval… I think enforcing initval in this way mucks up with SMC’s algorithm (tempering from the prior to the posterior).

tvwenger · August 2, 2024, 9:47pm

Thank you for the useful insight @bob-carpenter !

True. In my case, I’m just interested in the summary statistics across all chains, or even r-hat. If the labels are mis-matched between chains, then these statistics are meaningless.

Yes, this usually works. Usually I really want a solution that always works.

The truth about my models it that, depending on the parameters, the order of the components could be irrelevant (labeling degeneracy), the order of the components could be determined uniquely (no labeling degeneracy), or a mix of the two scenarios depending on the component parameters.

Is there some other way to parameterize a model like this? Like, could the order of the components somehow be another RV to be inferred? Would that make it easier or harder to sample?

tvwenger · August 3, 2024, 2:38pm

Thank you for this suggestion, @bob-carpenter. I have experimented with such a model and have found good results!

coords = {"component": range(n_components)}
with pm.Model(coords=coords) as model:
    line_area_norm = pm.Gamma(
        "line_area_norm", alpha=2.0, beta=1.0, dims="component"
    )
    line_area = 100.0 * line_area_norm

    fwhm_norm = pm.Gamma(
        "fwhm_norm", alpha=2.0, beta=1.0, dims="component"
    )
    fwhm = 30.0 * fwhm_norm

    center_lower = -20.0
    center_offset_norm = pm.HalfNormal(
        "center_offset_norm",
        sigma=1.0,
        dims="component",
    )
    center_offset = 20.0 * center_offset_norm
    center = pm.Deterministic(
        "center",
        center_lower + pm.math.cumsum(center_offset, axis=0),
        dims="component"
    )
    
    noise = pm.HalfNormal("noise", sigma=1.0)

    amplitude = calc_amplitude(line_area, fwhm)
    predicted = gaussian(x[:, None], amplitude, fwhm, center).sum(axis=1)
    _ = pm.Normal(
        "predicted", mu=predicted, sigma=noise, observed=observed
    )

The model samples fine with both NUTS and SMC. Some NUTS chains collapse to a single model, but SMC seems to find both modes just fine.

bob-carpenter · August 5, 2024, 5:26pm

That’s true if you use identity of cluster in the summary statistics. But most of what one cares about from clustering is not about summary statistics. You can calculate things like the data log density (ELPD) just fine for leave-one-out cross-validation, for example. The probability that two elements are in the same cluster can also be calculated with posterior predictive inference, as can the density of a new point. You can do posterior predictive checks no problem. None of this runs into label switching because it marginalizes over the label identity, so it should all lead to reasonable ESS and R-hat values.

Topic		Replies	Views
Order statistics in PyMC3 Questions	17	3520	January 18, 2024
Label switching in multivariate mixtures Questions	0	760	September 28, 2020
Marginal likelihood for distributions with discrete variables v3	12	1467	November 14, 2022
How to average multiple chains? v5	16	1134	April 20, 2023
Seeds: Random effect logistic regression Questions	31	1301	March 29, 2022

Modeling a Spectrum of Gaussians with SMC

Related topics