Estimating a dynamic switching process

I’m doing research on multiple-model state estimators for a type of (discrete-time) dynamical system with a switching disturbance model. However, the estimators require estimates of the model parameters and I’m wondering if I can estimate these from time series data using a numerical method in PyMC. But I’m not sure which method or how to set up the problem in PyMC.

The system consists of a linear dynamic system with the disturbance at the input. The disturbance is modelled as an integrator with a special random variable at the input known as a ‘random shock’ which is sampled from either of two normal distributions depending on the system mode (a discrete random state of the system).

So in pseudo-code, the data generating model, which has three parameters, would be something like this:

def random_shock(sigma1, sigma2, epsilon):
    r = sample(dist.Bernoulli(epsilon))
    if r == 0:
        wp = sample(dist.Normal(0, sigma1)
    else:
        wp = sample(dist.Normal(0, sigma2)
    return w_p

(It is a Gaussian mixture model with two modes).

This signal, w_p, is then fed through an integrator to produce the disturbance signal p at time k:

p(k) = p(k-1) + w_p(k)

p(k) is a ‘randomly-occurring step change’ process something like this:

This is then fed through a model of the process which for now assume is known. For simplicity, let’s assume the process model is a simple 1st-order stable system:

y(k) = 0.7 * y(k-1) + p(k)

The catch is, there is measurement error (noise) added to the output. So the measurement model is:

y_M(k) = y(k) + v(k)

where v(k) is the measurement noise.

Here is a diagram of the whole system:

My question is, if I had a time-series of measurements of y_M(k) of length N, how could I estimate the random shock parameters, sigma1, sigma2, and epsilon?

I’m pretty sure this is a case of joint state and parameter estimation. I.e. We need to estimate r(k), k = 0, 1, …, N and sigma1, sigma2, and epsilon at the same time.

A follow-up note which might help:

I think the problem is that there is a very large number of possible combinations of the r(k) sequence for large N. I’m hoping there is some kind of method which can explore the possible solutions according to their likelihood and ignore unlikely ones (i.e. an approximate approach). However, the shock occurrence is quite unlikely (0.01) so at least some of them have to be included. Maybe this part is a dynamic programming problem?

This is a tough one.

One approach could be to model the number (and time) of shocks instead of the probability of shocks. Here are two related examples:

It’s still tricky to sample because the most likely parameters of the random walk (or Autoregressive process) are likely to be very different for different number of switches (so multimodal on the margins) and it might be difficult for the sampler to switch between those states.

I am very curious if there’s a more established way that others may know. Maybe @jessegrabowski, @RavinKumar?

If I understand the system right, it looks like:

\begin{align} a_t &= \rho a_{t-1} + \eta_t \\ \eta_t &= \eta_{t-1} + \omega_t \\ \omega_t &= \begin{cases} \epsilon_{t,1} \quad \text{if} \quad r > p \\ \epsilon_{t,2} \quad \text{otherwise}\end{cases} \\ y_t &= a_t + \nu_t \end{align}

Assuming I understand the setup correctly (a heroic assumption), I think it might be useful to think about the shock probably r \sim Bernoulli(p) as being a time-varying parameter \varphi_t instead. This changes the system to:

\begin{align} a_t &= \rho a_{t-1} + \eta_t \\ \eta_t &= \eta_{t-1} + \omega_t \\ \omega_t &= \varphi_t \epsilon_{t,1} + (1 - \varphi_t) \epsilon_{t,2} \\ y_t &= a_t + \nu_t \end{align}

Now you’re basically at a standard linear state-space model, with the following transition equations:

\underbrace{\begin{bmatrix} a_t \\ \eta_t \end{bmatrix}}_{x_t} = \underbrace{\begin{bmatrix} \rho & 1 \\ 0 & 1 \end{bmatrix}}_A \underbrace{\begin{bmatrix} a_{t-1} \\ \eta_{t-1} \end{bmatrix}}_{x_{t-1}} + \underbrace{ \begin{bmatrix} \varphi_t & (1 - \varphi_t) \\ \varphi_t & (1 - \varphi_t) \end{bmatrix}}_{R_t} \underbrace{ \begin{bmatrix} \epsilon_{t,1} \\ \epsilon_{t, 2} \end{bmatrix}}_{\epsilon_t}

And observation equation:
y_t = \underbrace{\begin{bmatrix}1 & 0 \end{bmatrix}}_Z x_t + \nu_t

with:

\begin{align} \epsilon_t &\sim MVN(0, \Sigma) \\ \Sigma &= \begin{bmatrix} \sigma_1^2 & 0 \\ 0 & \sigma_2^2 \end{bmatrix} \\ \varphi_t &\sim \text{Bernoulli}(p) \\ \nu_t &\sim N(0, \sigma^2_\nu) \end{align}

The only wrinkle stopping you from having a bog standard LSM is the fact that the R matrix is time varying. Since there’s actually no time dynamics on R, you could just sampling T matrices (one per time-step) and apply the Kalman Filter as normal. But it I believe you could also implement this model directly in PyMC using a scan by adapting this example of a hidden markov model. Instead of sampling a transition probability at each time step, you would sample 3 variables: \varphi_t and \epsilon_t. The result of the scan would be the x_t vector, which you could slice to get the mean of y_t, which is just a bunch of i.i.d normals.

1 Like

Thanks jessegrabowski. You understand the system correctly, and yes it can be represented by a linear state-space model with a switching input matrix, or with a time-varying discrete parameter in the input matrix, \varphi_t, as you show (I think the top row in your R_t matrix should be zeros though). Not sure how this helps though.

And yes, it is a hidden Markov model, so I will check out this example you linked to, and get back to you. Thanks very much!

(b.t.w. I didn’t realize you can use latex in this forum, if I’d known that I might have written the question more clearly, sorry)

It’s not really a hidden markov model unless there are time dynamics in the probabilities of getting one shock or the other, no? From what I understood, there’s no transition probabilities between the two shocks, it’s just a static probability of getting one or the other.

I don’t think the top row of R should be zeros. Substituting everything gives:

a_t = \rho a_{t-1} + \eta_{t-1} + \varphi_t \epsilon_{t,1} + (1 - \varphi_t) \epsilon_{t,2}

Setting the top row of R to zero would wipe out the time t shocks in a_t.

My understanding of your definition is that there are two independent noises, \epsilon_1 and \epsilon_2, and the R_t matrix determines which one feeds the system (depending on whether \varphi is 0 or 1). But whichever noise is active, it should feed the second state, \eta_t not the a_t. So shouldn’t R_t be:

\begin{bmatrix} 0 & 0 \\ \varphi_t & (1-\varphi_t) \end{bmatrix} \begin{bmatrix} \epsilon_{t,1} \\ \epsilon_{t,2} \end{bmatrix}

which produces

\begin{bmatrix} 0 \\ \varphi_t \epsilon_{t,1} + (1-\varphi_t) \epsilon_{t,2} \end{bmatrix}

Thus

\eta_t = \eta_{t-1} + \omega_t = \eta_{t-1} + \varphi_t \epsilon_{t,1} + (1-\varphi_t) \epsilon_{t,2}

Anyway, I might be wrong, it’s not important to the question.

The HMM would have the following transition matrix

\Pi_{i,j} = \begin{bmatrix} 1-p & p\\ 1-p & p \end{bmatrix}

where p is the shock probability, e.g. p=0.01 (I labelled it epsilon in the question).

At the moment I’m having problems running the HMM example linked above. I’m getting errors but trying to figure them out…