Time series data ingestion question

I will state my question in a generic way relative to distance time and an object. Let’s say I have a range of distance that occurs once per day. Call it an upper bound and lower bound. This calculation happens once per day but is dynamic in its size, lets call it Y. So size Y can be call it 8ft one day 10ft another and so on. Now let’s say I have an object that moves through this space that is measured in velocity per second, lets call it X. I am trying to understand, measure and predict in how X moves across Y to go beyond the upper and lower bound from a machine learning point of view. Meaning…

My question is feeding machine learning models as a whole. As the greatest potential per day will be the furthest distance X moved from the upper or lower bound. Which will be a single value. Let’s just say on day one, X reached upper bound of Y and went 4 ft past it. 4 is a scalar value but I need to be able to understand how X_ per second is changing relative to equal 4 ft past upper day bound. Over a period of call it 10 years.

I understand there a a bunch of ways to play this. The most obvious is to subtract max distance from upper and lower and map it back in. Then I have a single value repeating 86400 per day relative to a dynamic change of X_ per second. I have played with putting 0 in on all seconds X is within upper and lower bound than repeating the max distance for every second after you breach upper and lower bound. If you simply subtract X_ per second from upper or lower bound and treat this as your independent variable, your quantifying change per second and I need to measure, quantify and predict the max distance past the upper or lower bound.

Does anyone have any insight and or experience with the inconstancy of time to calculations of different lengths. I run into this a lot in dealing in a base layer of time to math to larger time and the relationship of a calculation that can occur at any given second, relative to something that can occur every hour or even day, in how to deal with this from a mathematic point of view, relative to an algorithm within a ML model as a whole ingesting data.

Ive looked at it form the math point of view, a single value representing a day which will repeat statically call it 86400 times relative to a calculation that is dynamic and changing as a different value for 86400 times over a period of call it 5 years. I definitely underhand the mathematical variations in how you can deal with this. I am more curious in how you deal with it in feeding models as a whole in building out features and the advantages or disadvantages of one way or another. I certainly understand try them all and use the most reliable or stable statistical distribution…but maybe someone with more experience has some insight in how to tackle this from a ML algorithm point of view? Thanks.

I don’t really understand, but it sounds like you’re making the problem more complicated than it needs to be. Just model the thing you want to model? I don’t understand all the added complexity about repeating numbers and max values. If the thing doesn’t move it doesn’t move, so what?

If you have a vehicle driving and you record the speed every hour for 10 hours, you will have 10 instances or a vector of length 10. But you’re measuring RPM every second which gives you a vector of 360 per hour or 3600 per 10hr. This is not what I am doing, so forget the reason for the logic. My question is about inconsistent vector length relationship. Let’s just say it’s in a data frame. How do you deal with the fact that per hour your RPMs will have 360 rows and your hour speed will have 1 value at row 360. What I normally do is map the hour value back in per 360…so lets say at hour 1 my speed is call it 50 and my rpms are updating every second. So iteration 1 will be rmp[0]_3000 ~ speed[0]_50, iteration 2 rmp[1]_3012 ~ speed[1]_50, iteration 3 will be rmp[2]_3100 ~ speed[2]_50, iteration 4 rmp[3]_3150 ~ speed[3]_50…and so on.

There is a reason I am using the resolution per hour relative to something that updates every second. I understand that it’s not convention and I have a reason for that lack of convention. Because resolution and evolution are different fundamentals. And the relationship of how something evolves to one calculation, relative to how another calculation resolves in structure, tells more about what it will do next vs two calculations evolving side by side.

Like I said, forget RMP and speed, I have something else I am trying to do…I have always dealt with this by the example I have listed here. My question is has anyone had any experience in dealing with time like this. I am not adding anything. I’m simply measuring the speed of a planet at a full resolution in conclusion, which occurs once per 24 hours to the earth(relatively), relative to call it ocean temperature every second. And I was wondering if anyone has dealt with the nature of this inconsistency relationship.

Yes, just do what everyone does and move on. I get that the world is flat and we should all accept this and just move forward and do what’s to be done because thats what’s done. I’m asking about what if the world is not flat! I get its difficult to grasp.

I think you’re asking about state to observation mappings when things are at different temporal resolutions? The example about forward filling RPMs is a special case where you assume the unobserved high-frequency dynamics are constant, which I agree is a bad assumption in general. I have tooling for this in gEconpy, when for example you write a model of quarterly economic behavior, but you only get annual data from the national statistics office. That all lives in the linear state space framework, so you you’re going to get a dissertation about that next. It might be off the mark for your use case. The general answer doesn’t change, you need 1) high-frequency latent space, 2) low-frequency observations, 3) an aggregation function, 4) a bunch of bookkeeping.

In the LGSS case, you have a statespace system at the high frequency (the frequency where the thing you’re studying actually lives, quarterly in my macroeconomics example):

\begin{aligned} x_t &= T x_{t-1} + R \eta_t, \quad \eta_t \sim \mathcal{N}(0, Q) \\ y_t &= Z x_t + \varepsilon_t, \quad \varepsilon_t \sim \mathcal{N}(0, H) \end{aligned}

To map this to the lower frequency where the data is actually observed, we want to embed an aggregation matrix into the design matrix Z. To make things concrete, take my quarterly to annual example. x_t lives at quarterly frequency, y_t is observed annually. Suppose we only observe one time series, and the annual value is the sum of the four quarterly values. So Z takes the following structure:

Z_\text{sum} = \begin{bmatrix} 1 & 1 & 1 & 1 \end{bmatrix}

This is where the specific aggregation function shows up. If instead you only observe the final value after some steps (e.g. a stock price or the movement of an object), you would adjust:

Z_{\text{last}} = \begin{bmatrix} 0 & 0 & 0 & 1 \end{bmatrix}

To accommodate this, we need to augment x_t with the lags we need to do the aggregation in the observation equation, which in turn requires that we augment T in the following ways:

  • The top-left block is the original transition matrix T, which evolves the contemporaneous state x_t exactly as before.
  • The first sub-diagonal block is the identity: it copies x_{t-1} from the previous time step’s contemporaneous slot into the first lag slot at time t.
  • Each subsequent sub-diagonal block is also the identity, pushing lag-k into the lag-(k+1) slot one step further back.
  • Everything above the block diagonal is zero — lags don’t feed back into the contemporaneous state beyond what T already captures, and lags don’t influence each other except through this one-step shift.

Which leads to the following block lower-triangular matrix:

\tilde{T} = \begin{bmatrix} T & 0 \\ F & I_n \otimes C \end{bmatrix}

C is the (s-1)\times(s-1) lower-shift matrix, where s is the aggregation period (4 for quarterly→annual). For s=4:

C = \begin{bmatrix} 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix}

That handles the “push lag k into lag k+1” step. It’s constant — same for every aggregated variable — so it gets Kroneckered with I_n to give a block-diagonal copy per aggregated variable. F has a single unit entry per aggregated variable, copying the contemporaneous state into the first lag slot at the next step.

So — does any of this actually address what you were asking? If your question was about the state-to-observation mapping at mixed frequency, the recipe above is the whole thing in the case of a linear statespace. For nonlinear models the logic is largely similar – you can imagine an arbitrary PyMC model with one latent variable per actual observation, then doing some kind of scatter aggregation on it to build the mean of the observed variable. The KF is nice because you get marginalization of all those latent variables “for free”, but its by no means the only way.

If I didn’t understand the question let me know and I can try again.

1 Like