As i see it and please correct me if i am wrong, we can essentially create all sorts of BDLM models without the BDLM(bayesian dynamic linear models) “package” even though it might require a bit more work since the matrix notation that we can use in DLM offers a nice shortcut.
However, i am wondering if there are any models we cant really present without the package and secondly, i assume the package deploys a kalman filter - MCMC mixture algorithm, how does this compare to the standard (pure) NUTS-sampler in terms of computational efficiency, precision etc?
Just to be clear, the KF is an efficient method of computing the likelihood of a sequence given data, so it doesn’t really have anything to do with NUTS per se, except that NUTS consumes likelihood functions in order to generate posterior samples. But this is also true of anything else, including good old numerical minimization, so that’s why I say it doesn’t have anything to do with NUTS specifically.
You are correct that you can implement basically any model you like in PyMC without resorting to the specific matrix machinery of a linear statespace model – I’d even argue that the code is more readable without it. I’d say main advantage of the KF is automatic marginalization of hidden states and missing data. This is a very non-trivial point in these models that is very easy to get wrong – without the KF, at each step of the sequence you will need to manually compute the marginal distributions on hidden states/missing data given the observed data. There’s a (very rough) an example of how this ends up looking here.
By the way, if you’re interesting in working with dynamic linear models with kalman filtering, there’s a new statespace sub-module in pymc-experimental that provides them. You can see how the API works for structural models here. It’s still a work-in-progress, so if you try it and find any bugs, please report them!
2 Likes
Ah i see, KF’s sole responsibility is basically feeding likelihoods to the NUTS-sampler.
I am trying to intuitively understand the difference between using a static regression approach with timevarying parameters and a state space approach, in some cases they seem so similar that i think i must have been missing something in the very basics, i am a beginner.
Consider e.g the model below which we can estimate with the pure NUTS-sampler.
After reading through some of the link you provided to the new statespace sub-module it seems we can model the exact same thing under a state-space approach, i wonder what are the differences, the error term for example that is not timedependent in the above equation strikes me as a difference.
I can see how an state space approach (obviously) leads us to having more parsimonous models when modelling timeseries in many cases, in cases where we would need dummies for example we can instead model it as an latent process and solely have one parameter as an “init” or “start” parameter for the latent process e.g.
This also leads me to another question, the gaussian random walk prior in PYMC, i assume that models with it are not using the kalman filter but instead relies solely on NUTS, have they done this by manually computing the marginal distributions on hidden states?
thanks for answering, highly appreciate it.
do you know of a tutorial/overview of the method(kalman filter feeding likelihoods to MCMC-sampler) used to create the state space module here in pymc?
Sorry I didn’t reply more quickly. Your question about the difference between the model you posted and a recursive model is one I’ve been thinking a lot about lately. I think it deserves an entire tutorial notebook. I don’t have a good intuition about the theoretical differences. I think there’s a 1:1 mapping in many cases. I think KF errors are also independent conditional on the previous state, so that’s not it (otherwise they wouldn’t be normally distributed and the math that drives the KF wouldn’t work).
I think in the case of GRW, you don’t need any heavy machinery, because the time series is equivalent to first sampling a sequence of independent normals, then doing a cumulative sum. This is a deterministic operation, so it’s no different than first sampling then shifting/scaling (non-centered priors). For more complex situations, especially where you are adding/multiplying together random variables, rather than realizations of the variables, you have to do something more elaborate. So this is why you can’t just naively sample a sequence of independent normals then use them as “unobserved errors” in an ARMA model, for example.
Regarding your last question, this is the best KF resource on the internet for my money (disclaimer: I’m broke). It’s nothing to do with NUTS, but there’s not much to say on that connection. You just take a KF, compute the likelihood, and then you’re done. It’s like asking “what’s the connection between a normal distribution and NUTS”.
2 Likes
A quick comment that I think it might help clarify some confusion is that KF is also a inference algorithm in a sense that it estimate the latent state in the state space model. But in a DLM set up where latent states are marginalized out via KF, and we are actually interested in the “hyper”-parameters like the AR/MA coefficients which are inferenced via NUTS
1 Like