Advice for Time Series Forcasting

Hi,

I don’t have much experience with time series forecasting so I thought I gonna ask for advice.

I’ve got thousands of time series from patients suffering from a certain disease. Each time series measures a specific score that decreases over time, but not always completely linearly. There can be plateaus or short-term improvements. The delta between measurement dates can vary.

Now that I have a new patient with some measurement points, I would like to predict the possible course of the decline based on the old data I have.

I thought I could do this using Bayesian structural time series.

Any thoughts on this?

Best,
Thorsten

1 Like

This is a really interesting question and I would love to know more about this idea as well since it relates to similar problems in my work. I have explored this same idea but in a performance context. Ive seen this applied in sports where there is some expected decline with age. The solution I have seen for problems like this is to calculate the delta across years for each individual. Then average the deltas at each time bin (e.g, age) and fit a single curve that can be applied to all. This curve explains the delta w.r.t. the time axis. In theory, this can be used to predict further in time given a starting point. e.g., In your case, you have the measurements and you could simply age out from the most recent observation. My issue with these solutions has always been that there is never a single curve that fits all individuals and that there is some history that needs to be taken into account, such as initial conditions and performance (disease progression in your case) thereafter.

One solution I have tried is to use hierarchical splines or hierarchical HSGPs to fit a curve to each individual. I’ve only had minor success with this since the number of individuals in my data is similar to yours (in the thousands). I actually put together an example of this that is HEAVILY based on example notebooks from the pymc examples (1, 2, 3). The main reason I wrote my own is because in the examples by @bwengals, the length of each series was the same. I wanted to understand if we could apply hierarchical splines/HSGPs if the series were of different length. It just took a little bit of thinking about the indexing, but it did seem to work very well for the toy problem. This is an approach I am actively exploring since it does allow for plateaus/increases.

I’ve thought about approaching this in a few other ways:

  1. If the curve is smooth (e.g., progressive decline of illness), then I visualize this in my mind as a basic equation of motion of a projectile. In the simplest case, if we have the initial velocity and angle, then we can calculate the path of the trajectory over time. In your case, one could think of the problem as each individual as having their own initial conditions plus added noise. This of course would make strong assumptions about the smoothness of the decline and may violate what you mentioned about plateaus and improvements. However, this may not be totally unacceptable since estimating plateaus/increases may be difficult w/o intervention. Additionally, estimating those parameters would be straightforward since its like fitting nonlinear regression in a traditional hierarchical format.

  2. I’ve explored the idea of fitting these types of curves using something like a hierarchical prophet; however, when I tried this I struggled with sampling/divergences. I would need to think quite a bit more about the components of the model if I were going to go down this route. I do think it is possible to have this model work for this application. If I am thinking about patient decline correctly, I imagine this something like a constant offset over time + some slower Fourier component + faster Fourier component to capture noise. I’m just blindly shooting from the hip here so that may be completely wrong.

  3. A more advanced, and likely powerful, way to think about this problem is as an autoregressive problem. I’ve tried to dig into this quite a bit but had a hard time conceptualizing how I would go about solving this. I’ve played around with pm.AR but never got too far. In reality, since there are evolving dynamics where each time point is highly informed by the prior observation(s) (+ noise), this seems like the perfect place for a Kalman filter. I am really interested to dig a bit more into pymc_extras.statespace for something like this. I would recommend you watch the video by @jessegrabowski about his work on pymc_extras.statespace (discourse post). I am currently still working through the classic Time Series Analysis by State Space Methods, so I am not confident in my ability to provide guidance on this approach. If Jesse has any thoughts on this I would love to get his input!

I’m sorry I am not providing a more clear answer on structural time series, but I did want to throw my two cents out there since I have been thinking about basically the same problem. Hopefully we can get a good discussion going on this topic!

– Justin

3 Likes

Thank you for your long response. I’m currently searching in all directions. Have you looked at Darts?

It seem to support what we want:

https://unit8co.github.io/darts/userguide/forecasting_overview.html

However, I’m not convinced by their uncertainty estimates.

Best
Thorsten

Maybe a dynamic factor analysis (DFA) would help? I don’t have any hand-on experience with them to say if they’re suitable for sure, and it is rapidly approaching Friday evening, but I am aware that they model N time series as combinations of k latent processes, k << N, so it’s something of a dimensionality-reduction technique. I don’t know if you can fit a DFA and then use it on a new time series or what.

I’m only aware of them in the R world, but as a reference here’s A Very Short Introduction.

Also, here is a study that uses DFA that seems like it might relate to your issue, but it’s been on my “to read” list for 2+ years now.

And it might be worthwhile to look at the R Task View on time series, which ought to provide some ideas.

Although you mention non-evenly-spaced observations, which make me think of CAR (continuous AR) models.

Hopefully this is useful, until someone more well versed comes along.

1 Like

DFA would be quite easy to write as a statespace model, PRs welcome to pymc-extras to add that. The model is covered in Durbin and Koopsman Chapter 3.7

In general I agree with everything @JAB said above. I’ll just add 2 things:

  1. If you have plateaus, decreases, and short-term improvements, it might imply an HMM where there is a discrete latent state governing the time series dynamics
  2. The statespace module unfortunately don’t scale well at the moment, so if you have thousands of time series that you want to jointly estimate, it’s not a good choice. I hope this will change in the near future.
  3. For cases with big data and many time series, I suggest a method that reduces to linear regression: prophet, splines, or HSGP. Recursive models are always going to be way more computationally intensive than y = X \beta, even if getting X requires some effort.
2 Likes

Ok, thanks for all the input. Seems to be harder than I initally expected. What If I simplify the problem. Lets say I have a reference day and the corresponding value at that day (V0). Now I pick 6 measurement days before and after it. DB1 - DB6 could be the days before the reference day, and D1 - D6 the days after it (all given as distance to the reference day). VB1-VB6 and V1-V6 contain the corresponding values.

Couldnt I setup a model using V1-V6 as response variables and the others as predictor? If yes, couldnt I do an out-of-sample prediction for V1-V6 for a new timeseries ?

With ChatGPT I created sample data for that:

import numpy as np
import pandas as pd
import random

np.random.seed(42)

def simulate_measurement():
    # Simulate the time series over 2 years (720 days)
    num_days = 720
    initial_score = random.uniform(40, 48)
    decline_rate_per_day = random.uniform(0.5 / 30, 1.5 / 30)  # Per day decline
    days = np.sort(np.random.choice(range(20, 721), num_days // 10, replace=False))  # Random measurements every ~20-50 days
    scores = initial_score - decline_rate_per_day * days

    ref_index = random.randint(6, len(days) - 7)  # Ensure we have 6 measurements before and after
    ref_day = days[ref_index]
    ref_value = scores[ref_index]

    # Pick the 6 direct measurements before and after the reference day
    indices_before = list(range(ref_index - 6, ref_index))[::-1]
    indices_after = list(range(ref_index + 1, ref_index + 7))

    db = [ref_day - days[i] for i in indices_before]
    vb = [scores[i] for i in indices_before]

    d = [days[i] - ref_day for i in indices_after]
    v = [scores[i] for i in indices_after]

        
    return {
        'V0': ref_value,
        'DB1': db[0], 'VB1': vb[0],
        'DB2': db[1], 'VB2': vb[1],
        'DB3': db[2], 'VB3': vb[2],
        'DB4': db[3], 'VB4': vb[3],
        'DB5': db[4], 'VB5': vb[4],
        'DB6': db[5], 'VB6': vb[5],
        'D1': d[0], 'V1': v[0],
        'D2': d[1], 'V2': v[1],
        'D3': d[2], 'V3': v[2],
        'D4': d[3], 'V4': v[3],
        'D5': d[4], 'V5': v[4],
        'D6': d[5], 'V6': v[5],
    }

data_list = [simulate_measurement() for _ in range(100)]
df = pd.DataFrame(data_list)

# Display the resulting DataFrame
print(df)
``

The best approach is to have some kind of physical model of what you think is going on that can serve as a basis. For example, is it a downward trend with noisy measurements or can the trend actually plateau? Either way, you can model this with prophet’s additive model (which is very simple and would be easy to translate to PyMC if someone hasn’t done that already). You can come up with multiple models and with this much data compare them.

One way to do this is to have an underlying time series with a negative trend and then noisy measurements around the trend. The trend can be modeled parametrically or non-parametrically. You can model the underlying trend for all dates with sparse measurements, or in some cases, for example with normal noise, you can analytically figure out the change over multiple days (e.g., if the process is a random walk with standard normal change each day, the effect of two days is the sum of two standard normals, the variance of which is the sum of the normals and the location of which is the sum of the locations (often assumed to be zero).

3 Likes

This is all very useful advice. @jessegrabowski , since the statespace module does not currently scale as well, would it be feasible to impose a hierarchical structure on the weights of pm.AR?

I’ll let Jesse confirm, but I guess this wouldn’t scale better, as you’re keeping the recursive structure – and the hierarchical structure should complicate that further. My bet in this case would be HSGPs.

What would it take to change this @jessegrabowski ?

1 Like

These are the issues currently open that would give performance gains. 406 is the most important since it will allow us to actually start benchmarking performance across large panels of time series. 394 would allow for better performance on very long time series, and 332 promises to be an across-the-board speedup. These issues in pytensor are also relevant:

1100 would unlock more rewrites for matrix multiplication, but it isn’t necessarily an optimization itself.

There is also room for gains by working with square-root filters. This is especially nice since we sample covariance priors in Cholesky form anyway, so we’d actually never need to instantiate covariance matrix. I already wrote the filters, but actually using them is blocked by pytensor issue 1099, so we can’t compute their gradients.

Aside from all there, there’s also Chandrasekhar recursions instead of Kalman filtering in cases where it is permitted, which I think is the majority of interesting cases.

Finally, I think getting things over to numba is promising as well. There are some issues with the gradients currently being generated that prevents that, but I think it’s a long-term better solution than JAX, which is much more difficult to extend to specialized Ops that could be used to speed up computation, like solve_discrete_are and solve_discrete_lyapunov. I made an issue on the jax repo about these Ops but it was not warmly received. On the other hand, we already have numba dispatches for them.

4 Likes

I recently did an AR model with student-T innovations using pm.CustomDist on a panel of several thousand timeseries and it fit in a reasonable time (<=1hr). For simple models (e.g. no hidden states) this is where I’d start.

I think Prophet is still a really good general starting point. I go through an implementation here, and there’s also an example notebook. Neither goes into how to do hierarchy, but it’s just y = X\beta, so you’ll just be doing the usual indexing tricks for regression models. If you go that route and get stuck, don’t hesitate to make a thread asking for help.

2 Likes

To make sure I understand: this is for processes which don’t have local trends (i.e plateaus, decreases, improvements, etc.)?

You can represent trends with a drift term in an AR model. I’m talking about basically any scan model that has only one return value in the inner function. The logp inference can always work those out, and they fit really fast in nutpie.

Ok thanks, that’s clearer to me. I’m basically trying to really understand how you define “hidden states” (and their absence)

I recently found this: GitHub - fraenkel-lab/mogp: Mixture of Gaussian Processes Model for Sparse Longitudinal Data

I think it does what I want!

2 Likes