How to model sinusoids in white gaussian noise

mtwest729 · October 4, 2017, 2:27am

I am trying to a handle on how to use a some of the theano math functions within a probability model.

My first effort was to reconstruct the parameters of a single unknown sinusoid in white Gaussian noise, which is a very common introductory problem in astrostatistics. It also has the advantage of having a semi-analytical Bayesian solution as well as a myriad of alternative classical methods.

If I am using normal likelihood function

X_obs = pm.Normal('X_obs', mu=signal, sd=sigma, observed=X)

is the expression below the proper way to define signal model for PyMC3?

signal = amplt * pm.math.sin(omega * t + phi)

Where the estimated variables are the

line amplitude: amplt
line angular frequency: omega
line phase offset: phi
noise variance: sigma

The variable t are the time samples we record along with the measured values of the time series X. One could separate the above sinusoid into two that are 90 degrees out of phase and potentially simplify the computations, but I am just looking to see if I have got the general gist of model construction in this framework.

bwengals · October 4, 2017, 5:23am

I don’t see anything wrong with what you’re doing here, or how you’re thinking about it. For many straightforward things you can pretty much use theano as you would numpy, just replace np with tt. Have you tried running this model? Do you get reasonable results?

mtwest729 · October 19, 2017, 2:28am

Because the above model runs surprisingly slow, I decided to simplify the model further to be

amplt = pm.Normal(‘amplt’, mu=1, sd=1)
freq = pm.Uniform(‘freq’, 0, 0.5)
mu_x = amplt * pm.math.sin(2*pi * t * freq)
X_obs = pm.Normal(‘X_obs’, mu=mu_x, sd=1, observed=x)

and it still took 18:40 to take 5000 samples with NUTS. The number of data points is 512. After about 2500 samples, it hit on the correct answer pair. I presume that if I ran the simulation longer, the posterior samples would likely be far more weighted to

I guess the underlying question is how can I speed up this inference to something faster than ~4.5 iterations per second. Lots of important problems aren’t a single 0-deg phase sinusoid but a complex frequency structure that may require a rather messy multi-level model. I expect there are methods one can leverage to improve performance but I am still somewhat new to this API.

mtwest729 · October 19, 2017, 4:49am

Following up with a longer run, the results are actually even worse. In the ADVI computation at the beginning, the average ELBO is -861.43, which I presume is pretty poor. The posterior estimates are awful. I am curious if anyone else has the same problem given this is such a simple model. As I have said before, my gut instinct is to believe I am missing some particular step but have yet to figure out what.

aplavin · October 19, 2017, 5:31am

I get perfect results running the model exactly as you specified it:

t = np.linspace(0, 10)
x = np.sin(2*np.pi * t * 0.1)
x += np.random.randn(*x.shape)

with pm.Model() as model:
    amplt = pm.Normal('amplt', mu=1, sd=1)
    freq = pm.Uniform('freq', 0, 0.5)
    mu_x = amplt * pm.math.sin(2*np.pi * t * freq)
    X_obs = pm.Normal('X_obs', mu=mu_x, sd=1, observed=x)
    
    trace = pm.sample()

It takes just about 1 second of sampling to produce 1000 iterations (the default) and posteriors are centered at true values. I think you should post your complete code, including data generation, for comparison - or try mine from above and see if it works well.

mtwest729 · October 20, 2017, 2:39am

Not sure where exactly I was going wrong but @aplavin has ironed out whatever it was. Many thanks.

And now to see if I can expand the above model to include phase, unknown noise variance and then more sinusoids.

aseyboldt · October 20, 2017, 9:54am

Just to let you know, you can do a fft in theano. So if you have equally spaced observations and your prior can be expressed reasonably on the discrete frequency domain, than looking into that might make sense.

aseyboldt · October 20, 2017, 12:53pm

I was a bit curious what prior one could possibly use on the frequency and experimented a little.
It seems that if we set a uniform prior on the angle, and a strong shrinkage prior on each individual magnitude, we get an interesting prior on the functions.

import numpy as np
from scipy import stats, linalg, signal, spatial, fftpack
import matplotlib.pyplot as plt

%matplotlib inline

def polar_to_scipy(intercept, magnitude, angle):
    n = len(magnitude)
    assert len(angle) == n
    real = np.cos(angle)
    imag = np.sin(angle)
    vals = np.zeros(1 + 2 * n)
    vals[0] = intercept
    vals[1::2] = magnitude * real
    vals[2::2] = magnitude * imag
    return vals

n = 100
mag_scale = np.exp(-np.arange(n) / 20)
angle = 2 * np.pi * np.random.rand(n)
magnitude_theta = stats.halfcauchy().rvs(n)
magnitude_eta = np.abs(np.random.randn(n))
magnitude = magnitude_theta * magnitude_eta * mag_scale

plt.plot(magnitude)
y = fftpack.irfft(polar_to_scipy(0, magnitude, angle))

Random draws from this prior often look a bit like this:

No idea how happy the sampler would be about a model like this though…

@mtwest729 Obviously, I have no idea if this is even remotely interesting for your application, just playing around a little

mtwest729 · October 23, 2017, 3:13am

Working in the Fourier domain will definitely be helpful when I start to move away white Gaussian noise, as the Covariance Matrix can get pretty complex if there is any underlying frequency structure. If my hunch is right, this also requires me to include the frequency response of a boxcar time-domain filter on each of the sinusoids. But i am not completely sure about this, so I need to do some more scribbling.

But on the subject of priors, at least in the time domain

Probably the cleanest way to model a single sinusoid with a phase offset is to have the amplitude be positive and the phase to range across the full 2pi. You could also use a sine and a cosine with positive amplitudes and derive the phase shift from those values. It’s unclear which one would be more computationally with NUTS/ADVI. Probably the latter but recommendations welcome.
The prior on the sinusoidal frequency is tricky as the relevant band may span many orders of magnitude. In my original example, I was lazy and used a uniform distribution despite knowing it would weight higher frequencies more.
The most interesting & troublesome one is the number of sinusoids. I didn’t see anything in the documentation on having reversible-jump/model-scale evaluation built into the sampler. How to address this prior is currently a bit beyond me at the moment.

aseyboldt · October 23, 2017, 3:07pm

About 1: Using a uniform distribution on [0, 2pi) for the phase is trouble. The problem is the different topology of angles and intervals, you need to somehow include the wrapping behaviour of angles, ie 0 is the same as 2pi. If you don’t, the sampler will have a hard time and your posterior could even and up being multimodal, in which case the results could be misleading/wrong. If you also want to have the frequency anyway, then the combination of phase and frequency can just be expressed tuples (a, b), where frequency = hypot(a, b) and phase = atan2(a, b).

Some of my old code might help you with this:

class CartesianTransform(pm.distributions.transforms.Transform):
    """Transform from polar to cartesian coordinates."""
    name = 'cartesian'

    def backward(self, x):
        r = tt.sqrt((x ** 2).sum(-1))
        phi = tt.arctan2(x[..., 0], x[..., 1])
        return tt.stack([r, phi], -1)

    def forward(self, y):
        r, phi = y[..., 0], y[..., 1]
        a = r * tt.cos(phi)
        b = r * tt.sin(phi)
        return tt.stack([a, b], -1)

    def jacobian_det(self, y):
        return -0.5 * tt.log((y ** 2).sum(-1))
    
    def forward_val(self, y, point=None):
        return self.forward(y)

    
class VonMisesLength(pm.Continuous):
    def __init__(self, mu=None, kappa=None, length_dist=None, *args, **kwargs):
        trafo = kwargs.pop('transform', PolarTransform())
        shape = kwargs.pop('shape', (2,))
        shape = np.atleast_1d(shape)
        if len(shape) < 1 or shape[-1] != 2:
            raise ValueError('Invalid shape. Must end with 2.')
        if length_dist is None:
            raise TypeError('Missing argument: length_dist.')
        if sum(val is None for val in [mu, kappa]) == 1:
            raise TypeError('Specify either both or none of mu and kappa.')
        self._length_dist = length_dist
        self._mu = mu
        self._kappa = kappa
        self._default = np.ones(shape)
        super().__init__(transform=trafo, shape=shape, defaults=('_default',),
                         *args, **kwargs)
    
    def logp(self, x):
        r, phi = x[..., 0], x[..., 1]
        logp = self._length_dist.logp(r)
        if self._mu is not None:
            mu, kappa = self._mu, self._kappa
            pos = phi - mu
            pos_logp = kappa * tt.cos(pos) - tt.log(tt.i0(kappa))
            logp += pos_logp
        logp -= np.log(2 * np.pi)
        return logp

VanMisesLength works like pm.VanMises, but returns an angle and a length (stacked in the last dimension of the result). It doesn’t have the problem of multiple modes that the VanMises implementation has, either.

You can use it like this:

with pm.Model() as model:
    #length = pm.HalfNormal.dist(sd=1)
    # Define some prior *distribution* on the frequency
    length = pm.Bound(pm.Normal, lower=0).dist(mu=4, sd=0.5)
    # mu and kappa allow you to put a prior on the angle
    vals = VonMisesLength('a', mu=np.pi/4, kappa=1, length_dist=length, shape=(3, 2))
    frequency = vals[..., 0] # shape (3,). You could also just ignore this if you don't need it.
    angle = vals[..., 1]  # is in (-pi, pi)

On the transformed (cartesian) space this leads to stuff like this (this is what the sampler is working with):

About 3: Yes, this is a problem, where I never really know what to do either. I actually started looking into the Fourier version because that might be a way to work around that. Instead of trying to model a small number of sin terms with unknown frequency, model a large number of evenly spaced known frequencies, where most amplitudes are very small, and only a few have significant impact on the result.

aseyboldt · October 23, 2017, 3:22pm

Oh, and if you have frequencies with very different orders of magnitude, then maybe it is better to use VanMiseslength with some arbitraty length distribution that the sampler likes (say pm.Normal.dist(mu=4, sd=1) or so), then just ignore the second output and introduce a separate variable for the actual frequency:

with pm.Model() as model:
    length = pm.Bound(pm.Normal, lower=0).dist(mu=4, sd=1)  # The sampler seems to like this
    vals = VonMisesLength('a', mu=np.pi/4, kappa=1, length_dist=length, shape=(3, 2))
    # _ = vals[..., 0] # we don't need this
    phase = vals[..., 1]  # is in (-pi, pi)
    frequency = pm.HalfNormal('freq', ...)

junpenglao · May 8, 2018, 1:46pm

A post was split to a new topic: Inferencing trigonometric time series model

Topic		Replies	Views
Using non-Theano function in likelihood, complicated model for optimal experimental design Questions	0	395	May 22, 2020
Inferencing trigonometric time series model Questions	5	1476	May 14, 2018
Issues with theano in a python code used to generate mu in a Gaussian model v3 theano	3	550	April 20, 2022
Signal frequency estimation (regression with sinusoidal mean) in PyMC3 Questions	1	1348	April 7, 2018
Implementing normal CDF for signal detection theory model (includes full sample code) Questions	3	1097	August 17, 2020

How to model sinusoids in white gaussian noise

Related topics