If I understand the system right, it looks like:
\begin{align} a_t &= \rho a_{t-1} + \eta_t \\ \eta_t &= \eta_{t-1} + \omega_t \\ \omega_t &= \begin{cases} \epsilon_{t,1} \quad \text{if} \quad r > p \\ \epsilon_{t,2} \quad \text{otherwise}\end{cases} \\ y_t &= a_t + \nu_t \end{align}
Assuming I understand the setup correctly (a heroic assumption), I think it might be useful to think about the shock probably r \sim Bernoulli(p) as being a time-varying parameter \varphi_t instead. This changes the system to:
\begin{align} a_t &= \rho a_{t-1} + \eta_t \\ \eta_t &= \eta_{t-1} + \omega_t \\ \omega_t &= \varphi_t \epsilon_{t,1} + (1 - \varphi_t) \epsilon_{t,2} \\ y_t &= a_t + \nu_t \end{align}
Now you’re basically at a standard linear state-space model, with the following transition equations:
\underbrace{\begin{bmatrix} a_t \\ \eta_t \end{bmatrix}}_{x_t} = \underbrace{\begin{bmatrix} \rho & 1 \\ 0 & 1 \end{bmatrix}}_A \underbrace{\begin{bmatrix} a_{t-1} \\ \eta_{t-1} \end{bmatrix}}_{x_{t-1}} + \underbrace{ \begin{bmatrix} \varphi_t & (1 - \varphi_t) \\ \varphi_t & (1 - \varphi_t) \end{bmatrix}}_{R_t} \underbrace{ \begin{bmatrix} \epsilon_{t,1} \\ \epsilon_{t, 2} \end{bmatrix}}_{\epsilon_t}
And observation equation:
y_t = \underbrace{\begin{bmatrix}1 & 0 \end{bmatrix}}_Z x_t + \nu_t
with:
\begin{align} \epsilon_t &\sim MVN(0, \Sigma) \\ \Sigma &= \begin{bmatrix} \sigma_1^2 & 0 \\ 0 & \sigma_2^2 \end{bmatrix} \\ \varphi_t &\sim \text{Bernoulli}(p) \\ \nu_t &\sim N(0, \sigma^2_\nu) \end{align}
The only wrinkle stopping you from having a bog standard LSM is the fact that the R matrix is time varying. Since there’s actually no time dynamics on R, you could just sampling T matrices (one per time-step) and apply the Kalman Filter as normal. But it I believe you could also implement this model directly in PyMC using a scan by adapting this example of a hidden markov model. Instead of sampling a transition probability at each time step, you would sample 3 variables: \varphi_t and \epsilon_t. The result of the scan would be the x_t vector, which you could slice to get the mean of y_t, which is just a bunch of i.i.d normals.