I have been reading / testing out the New State Space Model in pymc.
Stepping outside my comfort zone, I want to now test out custom solid state Models.
Here I want to model a Graph Like structure where the Edges are Timeseries and Impact the adjacent Nodes. The Edge Values are updated on an Event based fashion, where only 1 Event occurs at a time.As a result I want to predict the values of a path through the nodes, aswell as how shocks impact the path.
Really looking for Tips / Additional Material on what “bells & whistles” I need to change. With pytensor.scan I could model simple dependencies, got stuck with some more complex ones however.
Thanks a lot for the reply.
yes, a Hidden Markov model would term it quite nicely.
As far as dimensionality is considers I have about 100 nodes, with 100x100 possible connections. Here discrete values are observed. Where observations are when an event t using an edge, reached its goal and recorded a (possible) delay.
My main concern is that I only observe events (trip x passed edge with delay …, at planned time …)
from that I want to model the latent “graph” of 100 nodes. Since this new state of the graph should then be used in the prediction of the new event t+1.
At the end I would like to model the certainty of delay values throughout the time, given an assumed route
This is helpful information, but could be interpreted a couple of ways. Do you mean that at each point in time, a node could be in one of 100 states, or do you mean that the overall system’s time evolution unfolds in 100 timesteps?
from that I want to model the latent “graph” of 100 nodes.
Would you happen to have a reference or URL for something qualitatively similar to what you want?
In the state space modeling literature, there’s some important nomenclature that would be good to use here.
The number of timesteps, T to be modeled
The dimension of the unobserved state
The dimensionality of the observations
Any additional predictor variables
As best as I can tell here, the dimensionality of the state vector is 100 and as there are discrete states, you can model this with a 100x100 transition matrix, to which you alluded.
To try to short-circuit this conversation, I’ll warn you that directly modeling a discrete transition matrix of size 100x100 is not an ideal use case for PyMC (and in fact, is very hard without some additional assumptions or bespoke algorithms) because the underlying MCMC sampler works best on continuous variates. If the states are continuous, it’s a different matter.
Sorry to be so imprecise earlier. (This is a learning Experience for me, and I hope to not waste any of your time). Essentially I found a Paper in TRR “Delay Propagation in Large Railway Networks with Data-Driven Bayesian Modeling” that modeled the Delay Propagation of Railways in Sydney, Australia as a Markov chain. I want to recreate this approach for the Open Source Belgian Railway Data.
In this Dataset there are 100 unique daily Trips (paths through the graph, with each about 10 unique Stops), where the Graph has around 600 Edges with 1500 Edges (most Nodes have 2-3 adjacent nodes).
Here I would like to model the delay (conditional posterior) of Individual Trips for a specific Day.
I still have some problems understanding how to incorporate prior recorded Stops of Trains, aswell as possible Delays on specific Edges into the Markov chain. I already worked through some toy examples e.g. “Markov Models From The Bottom Up, with Python” by Ericmjl that gives some introductions into Markov Modeling. But especially the Time specific Events, (stops that are recorded and then influence adjacent stations) give me a hard time, to model.
Got it. I looked at the abstract for your reference and found this line:
In the model, the propagation satisfies the Markov property if one can predict future delay propagation in the network based solely on its present state just as well as one could knowing the process’s full history, so that it is independent of such historical procedures.
This is an important piece of information. Essentially, it looks like they are representing the entire system as a set of smaller conditional models, one for each station. This is attractive because if the system is indeed Markovian, then the path that a train took to a certain station doesn’t add any extra information.
Then, you can simulate full paths through the system through ancestral sampling, i.e. just sampling each conditional one at a time to get draws of a full path through the train network. The actual parameters you estimate would be the conditional probabilities of a train going from node A to downstream node B, C, or D, and possibly also the time required to make the trip.
The authors of that paper don’t seem to be using prior recorded train stops to inform the probability of later train stops being delayed, although I can’t be sure since I don’t have full access to the paper. Do you want to reproduce their work or improve on it?
Phrased differently, is there some sort of difficult-to-observe state for the train’s passage (maybe personnel, passengers, station capacity) which influences the time of arrival?
Generally I would first like to reproduce their work, as this is challenging me enough right now. In the future some sort of Latent Variable like “train disruption” or the Latent Variable “Amount Passengers Transported” would definitely lead to some interesting insights into the railway network.
I still can’t quite wrap my head around how I can “store” the previous delays at some time, to pass onto the next travel time predictions in the .scan functionality. Especially when for example a past observation of delay in one edge is some reasonable time away from the current predicted timestep.
Huge thanks for spending so much time with my problem . I am really excited to leverage PYMC for this use case, as the Code from the paper isn’t public unfortunately.
My hunch is that you don’t really need scan here. Everything I’ve seen so far seems to consistent with a well-structured linear model. Let’s say that you want to model the time from node 3 to 4 given previous trip legs 1 → 2 and 2 → 3. Then, my claim is that you may want to start with a model like this: y_{34} \sim \text{LogNormal}(\mu_{34} + \beta_{23} y_{23} + \beta_{12} y_{12},...)
where I have omitted the scale parameter and chosen the log-normal distribution solely out of convenience to ensure it is positive. My interpretation of this model is that the trip time on leg 3 to 4 is that it is well-represented by some average trip time \mu_{34} plus some extra adjustment based on previous legs, ideally to account for factors causing delays which may spill over across legs.
Is this at all close to your intent?
Would you be able to share a slimmed-down version of your code with a minimal example of what you’re trying to implement? It doesn’t need to be running properly; I’d just like to see what you have down so far.
Riffing on that idea, you can probably use an adjacency matrix and do a kind of linear message passing. If you have an adjacency matrix A and parameter vectors alpha and b (where alpha[i] is the conditional expected delay for station [i] and b[i] controls the change in expected delay resulting from the i-th station being on your travel path) then the model can be written as mu= alpha + A @ (b[:, None]) * y) for a single “lag” model. 2 lags would be alpha + A @ A @ (b[:, None]) * y) + A @ (b[:, None]) * y), and so on.