Inferencing trigonometric time series model

Following the conversation in How to model sinusoids in white gaussian noise, I have been trying to fit a sinusoidal model as well. I have tried a lot of different parametrizations with NUTS, some of which are as follows:

  1. Try to fit y(t) = Asin(\omega t) + Bcos(\omega t) + noise where A and B are drawn as Normal random variables.
  2. Draw A and B as Normal random variables, and fit y(t) = Csin(\omega t + \phi) where C=\sqrt{A^{2}+B^{2}} and \phi = tan^{-1}\frac{B}{A}.
    I tried some others which were worse than these two.

The conversation has revolved around the phase parameter mostly in this discussion, but I think the issue lies more with NUTS having problems in a nonlinear model. VERY suprisingly (to me at least!), Metropolis works fabulously in this model (I tried the parametrization number 1 above). There has been at least one previous report of NUTS being miserable in a nonlinear model when Metropolis worked well, and it seems that issue wasn’t resolved back then:

The simplest dataset:

t = np.arange(10000)/1000
# 3.5Hz sine wave
data = np.sin(2*np.pi*3.5*t) + np.random.normal(size = t.shape[0])
plt.plot(t, data)


Here’s my pymc3 model:

with pm.Model() as model:                        
     a = pm.Normal("a", mu = 1, sd = 2)                                  
     b = pm.Normal("b", mu = 1, sd = 2)
     omega = pm.Gamma("omega", 1, 1)            
     regression = a*tt.sin(2*np.pi*omega*t) + b*tt.cos(2*np.pi*omega*t)  
     sd = pm.HalfCauchy("sd", 0.5)                                       
     observed = pm.Normal("observed", mu = regression, sd = sd, observed = data)

Here’s how Metropolis sampling looks after just 60k iterations (picks up all parameters well, explores the space much more than NUTS). Even more surprisingly, this is not dependent on where I initialize Metropolis, I am not initializing at the MAP estimate or anything here:

with model:
     tr = pm.sample(tune = 10000, draws = 50000, njobs = 3, step = pm.Metropolis())

And here’s how the NUTS run looks (omega is basically stuck at where it was started) - initializing right at omega = 3.5 works, of course, but nothing else does:

with model:
     tr = pm.sample(tune = 4000, draws = 2000, njobs = 3)

I calculated the shape of the log-likelihood space (approximately - I just consider the sin function alone) that the samplers are exploring, and the peak at 3.5Hz is flanked by HUGE valleys! Plus the space has lots of local maxima (although they are tiny compared to the mode):

from scipy.stats import norm

def mean_prediction(omega, t):
   return np.sin(2*np.pi*omega*t)

omega = np.linspace(0, 5, 100)

logpdfs = []
for i in range(omega.shape[0]):
   logpdfs.append(norm.logpdf(x = data, loc = mean_prediction(omega[i], t), scale = 1.0))

logpdfs = np.array(logpdfs)
plt.plot(omega, np.sum(logpdfs, axis = -1))

So, in summary, NUTS gets stuck while sampling omega (and if you start at the right omega, it obviously gets the right answers), but nothing like that happens with Metropolis. Even with a moderate number of iterations, Metropolis gets the right parameters irrespective of where it was started (point to note that N_{eff} of omega is still smaller than those of a and b in Metropolis, but much larger than anything that NUTS can produce).

Finally, a post earlier in this thread showed that NUTS seemingly works when the data has been generated with the frequency being 0.1Hz. At that low a frequency (and many fewer time points than in my dummy datasets), the difference in the log likelihood of the mode (0.5Hz) compared to neighboring values is much smaller, and the peak at the mode is much broader than in the higher frequency examples I have been talking about here. Basically, when the frequency is high, the log likelihood drops a lot from the mode very fast, giving a really sharp spike.

I do not know if this shape of the space being explored is the reason why NUTS is miserable (I don’t know how to test that), so I am very confused about why this is happening. As this has happened previously with other nonlinear models and NUTS, I was wondering if you guys have any ideas about how to remedy this. In my specific case, this sinusoidal model is a small part of a much bigger model I am working with, so I do not think just using Metropolis will work in my actual use case. And anyways, we got to understand why NUTS is failing in this way, esp given its almost always prescribed to be much better than Metropolis in exploring.
@junpenglao @twiecki @fonnesbeck @aseyboldt

I am sorry for the really long post, and really appreciate the time you guys are taking to read this :slight_smile:

1 Like

This is a very interesting discussion - Hope you dont mind I made it into a new post.

Before I test your model, a quick comment:
I am not sure it is fair to say Metropolis is performing better or even correct. The posterior is multimode and Metropolis is basically stuck in one mode - that’s also why initializing at omega = 3.5 for NUTS also produce nice trace. “Stuck in a mode” might appear to be stationarity but it is actually not inferencing the model correctly.

I see! But, I was thinking since the mode is so much bigger than all the other small local peaks, we would want our sampler to actually stick to that one mode. I was thinking that the issue might be that there’s a huge valley flanking the big mode on both sides, so approaching it from either side, the sampler runs into those valleys, and it can’t cross that region of very low probability.

I dont think that is the correct intuition. Think of the MCMC sampling as trying to take an integral, all these small local peaks adds up to a lot of volumes that NUTS is desperately trying to compensate, while Metropolis wrongly stuck in one mode. Notice also that since we are estimating an expectation using MCMC samples, it will not converge to anything meaningful if the expectation does not exist - which I suspect it is what the NUTS sampler is showing here.

So I did some further simulation to demonstrate my point:
Using the same model, I fixed the b and sd to the real value, and plot the 2D conditional log-likelihood of a and omega:
Immediately you can see, there are a lot of weight around a = 0, but let’s zoom in to the correct value:
Indeed, there is a local maximum around 3.5, but it is not particularly salient.
Moreover, zoom in around omega=0
Again, you see a lot of volume.
All in all, I think it should convince you that why this model is difficult to sample, and why Metropolis although seemly return a better looking chain, the estimation is wrong.

Notebook here:

Thanks @junpenglao, yes I realized that your point was correct regarding the volume exploration that NUTS was doing in this case. I have analyzed this in some other very simple non-linear scenarios, and often (I guess) we don’t realize how complicated the posteriors can be in these cases. I will post about those soon!

1 Like