How to model sinusoids in white gaussian noise


#1

I am trying to a handle on how to use a some of the theano math functions within a probability model.

My first effort was to reconstruct the parameters of a single unknown sinusoid in white Gaussian noise, which is a very common introductory problem in astrostatistics. It also has the advantage of having a semi-analytical Bayesian solution as well as a myriad of alternative classical methods.

If I am using normal likelihood function

  • X_obs = pm.Normal('X_obs', mu=signal, sd=sigma, observed=X)

is the expression below the proper way to define signal model for PyMC3?

  • signal = amplt * pm.math.sin(omega * t + phi)

Where the estimated variables are the

  1. line amplitude: amplt
  2. line angular frequency: omega
  3. line phase offset: phi
  4. noise variance: sigma

The variable t are the time samples we record along with the measured values of the time series X. One could separate the above sinusoid into two that are 90 degrees out of phase and potentially simplify the computations, but I am just looking to see if I have got the general gist of model construction in this framework.


#2

I don’t see anything wrong with what you’re doing here, or how you’re thinking about it. For many straightforward things you can pretty much use theano as you would numpy, just replace np with tt. Have you tried running this model? Do you get reasonable results?


#3

Because the above model runs surprisingly slow, I decided to simplify the model further to be

amplt = pm.Normal(‘amplt’, mu=1, sd=1)
freq = pm.Uniform(‘freq’, 0, 0.5)
mu_x = amplt * pm.math.sin(2*pi * t * freq)
X_obs = pm.Normal(‘X_obs’, mu=mu_x, sd=1, observed=x)

and it still took 18:40 to take 5000 samples with NUTS. The number of data points is 512. After about 2500 samples, it hit on the correct answer pair. I presume that if I ran the simulation longer, the posterior samples would likely be far more weighted to

I guess the underlying question is how can I speed up this inference to something faster than ~4.5 iterations per second. Lots of important problems aren’t a single 0-deg phase sinusoid but a complex frequency structure that may require a rather messy multi-level model. I expect there are methods one can leverage to improve performance but I am still somewhat new to this API.


#4

Following up with a longer run, the results are actually even worse. In the ADVI computation at the beginning, the average ELBO is -861.43, which I presume is pretty poor. The posterior estimates are awful. I am curious if anyone else has the same problem given this is such a simple model. As I have said before, my gut instinct is to believe I am missing some particular step but have yet to figure out what.


#5

I get perfect results running the model exactly as you specified it:

t = np.linspace(0, 10)
x = np.sin(2*np.pi * t * 0.1)
x += np.random.randn(*x.shape)

with pm.Model() as model:
    amplt = pm.Normal('amplt', mu=1, sd=1)
    freq = pm.Uniform('freq', 0, 0.5)
    mu_x = amplt * pm.math.sin(2*np.pi * t * freq)
    X_obs = pm.Normal('X_obs', mu=mu_x, sd=1, observed=x)
    
    trace = pm.sample()

It takes just about 1 second of sampling to produce 1000 iterations (the default) and posteriors are centered at true values. I think you should post your complete code, including data generation, for comparison - or try mine from above and see if it works well.


#6

Not sure where exactly I was going wrong but @aplavin has ironed out whatever it was. Many thanks.

And now to see if I can expand the above model to include phase, unknown noise variance and then more sinusoids.


#7

Just to let you know, you can do a fft in theano. So if you have equally spaced observations and your prior can be expressed reasonably on the discrete frequency domain, than looking into that might make sense.


#8

I was a bit curious what prior one could possibly use on the frequency and experimented a little.
It seems that if we set a uniform prior on the angle, and a strong shrinkage prior on each individual magnitude, we get an interesting prior on the functions.

import numpy as np
from scipy import stats, linalg, signal, spatial, fftpack
import matplotlib.pyplot as plt

%matplotlib inline

def polar_to_scipy(intercept, magnitude, angle):
    n = len(magnitude)
    assert len(angle) == n
    real = np.cos(angle)
    imag = np.sin(angle)
    vals = np.zeros(1 + 2 * n)
    vals[0] = intercept
    vals[1::2] = magnitude * real
    vals[2::2] = magnitude * imag
    return vals

n = 100
mag_scale = np.exp(-np.arange(n) / 20)
angle = 2 * np.pi * np.random.rand(n)
magnitude_theta = stats.halfcauchy().rvs(n)
magnitude_eta = np.abs(np.random.randn(n))
magnitude = magnitude_theta * magnitude_eta * mag_scale

plt.plot(magnitude)
y = fftpack.irfft(polar_to_scipy(0, magnitude, angle))

Random draws from this prior often look a bit like this:

image

image

No idea how happy the sampler would be about a model like this though…

@mtwest729 Obviously, I have no idea if this is even remotely interesting for your application, just playing around a little :slight_smile:


#9

Working in the Fourier domain will definitely be helpful when I start to move away white Gaussian noise, as the Covariance Matrix can get pretty complex if there is any underlying frequency structure. If my hunch is right, this also requires me to include the frequency response of a boxcar time-domain filter on each of the sinusoids. But i am not completely sure about this, so I need to do some more scribbling.

But on the subject of priors, at least in the time domain

  1. Probably the cleanest way to model a single sinusoid with a phase offset is to have the amplitude be positive and the phase to range across the full 2pi. You could also use a sine and a cosine with positive amplitudes and derive the phase shift from those values. It’s unclear which one would be more computationally with NUTS/ADVI. Probably the latter but recommendations welcome.
  2. The prior on the sinusoidal frequency is tricky as the relevant band may span many orders of magnitude. In my original example, I was lazy and used a uniform distribution despite knowing it would weight higher frequencies more.
  3. The most interesting & troublesome one is the number of sinusoids. I didn’t see anything in the documentation on having reversible-jump/model-scale evaluation built into the sampler. How to address this prior is currently a bit beyond me at the moment.

#10

About 1: Using a uniform distribution on [0, 2pi) for the phase is trouble. The problem is the different topology of angles and intervals, you need to somehow include the wrapping behaviour of angles, ie 0 is the same as 2pi. If you don’t, the sampler will have a hard time and your posterior could even and up being multimodal, in which case the results could be misleading/wrong. If you also want to have the frequency anyway, then the combination of phase and frequency can just be expressed tuples (a, b), where frequency = hypot(a, b) and phase = atan2(a, b).

Some of my old code might help you with this:

class CartesianTransform(pm.distributions.transforms.Transform):
    """Transform from polar to cartesian coordinates."""
    name = 'cartesian'

    def backward(self, x):
        r = tt.sqrt((x ** 2).sum(-1))
        phi = tt.arctan2(x[..., 0], x[..., 1])
        return tt.stack([r, phi], -1)

    def forward(self, y):
        r, phi = y[..., 0], y[..., 1]
        a = r * tt.cos(phi)
        b = r * tt.sin(phi)
        return tt.stack([a, b], -1)

    def jacobian_det(self, y):
        return -0.5 * tt.log((y ** 2).sum(-1))
    
    def forward_val(self, y, point=None):
        return self.forward(y)

    
class VonMisesLength(pm.Continuous):
    def __init__(self, mu=None, kappa=None, length_dist=None, *args, **kwargs):
        trafo = kwargs.pop('transform', PolarTransform())
        shape = kwargs.pop('shape', (2,))
        shape = np.atleast_1d(shape)
        if len(shape) < 1 or shape[-1] != 2:
            raise ValueError('Invalid shape. Must end with 2.')
        if length_dist is None:
            raise TypeError('Missing argument: length_dist.')
        if sum(val is None for val in [mu, kappa]) == 1:
            raise TypeError('Specify either both or none of mu and kappa.')
        self._length_dist = length_dist
        self._mu = mu
        self._kappa = kappa
        self._default = np.ones(shape)
        super().__init__(transform=trafo, shape=shape, defaults=('_default',),
                         *args, **kwargs)
    
    def logp(self, x):
        r, phi = x[..., 0], x[..., 1]
        logp = self._length_dist.logp(r)
        if self._mu is not None:
            mu, kappa = self._mu, self._kappa
            pos = phi - mu
            pos_logp = kappa * tt.cos(pos) - tt.log(tt.i0(kappa))
            logp += pos_logp
        logp -= np.log(2 * np.pi)
        return logp

VanMisesLength works like pm.VanMises, but returns an angle and a length (stacked in the last dimension of the result). It doesn’t have the problem of multiple modes that the VanMises implementation has, either.

You can use it like this:

with pm.Model() as model:
    #length = pm.HalfNormal.dist(sd=1)
    # Define some prior *distribution* on the frequency
    length = pm.Bound(pm.Normal, lower=0).dist(mu=4, sd=0.5)
    # mu and kappa allow you to put a prior on the angle
    vals = VonMisesLength('a', mu=np.pi/4, kappa=1, length_dist=length, shape=(3, 2))
    frequency = vals[..., 0] # shape (3,). You could also just ignore this if you don't need it.
    angle = vals[..., 1]  # is in (-pi, pi)

On the transformed (cartesian) space this leads to stuff like this (this is what the sampler is working with):
image

About 3: Yes, this is a problem, where I never really know what to do either. I actually started looking into the Fourier version because that might be a way to work around that. Instead of trying to model a small number of sin terms with unknown frequency, model a large number of evenly spaced known frequencies, where most amplitudes are very small, and only a few have significant impact on the result.


GP kernel for polar angle dimension
#11

Oh, and if you have frequencies with very different orders of magnitude, then maybe it is better to use VanMiseslength with some arbitraty length distribution that the sampler likes (say pm.Normal.dist(mu=4, sd=1) or so), then just ignore the second output and introduce a separate variable for the actual frequency:

with pm.Model() as model:
    length = pm.Bound(pm.Normal, lower=0).dist(mu=4, sd=1)  # The sampler seems to like this
    vals = VonMisesLength('a', mu=np.pi/4, kappa=1, length_dist=length, shape=(3, 2))
    # _ = vals[..., 0] # we don't need this
    phase = vals[..., 1]  # is in (-pi, pi)
    frequency = pm.HalfNormal('freq', ...)