To me this looks there are at least the following components:

- At each time t, there is a an unknown probability p_t that an event will occur.
- There is an autoregressive component, so that y_{t+1} = \rho y_t
- There is a (known?) maximum value y^\star, such that y_t = \max(y, y^\star)
- The “event impulse size”, the amount the trajectory increases when the event fires.

Here is some code to generate data in this setup:

```
T = 50
n_simulation = 100
# Parameters
p_event = 0.1
rho = 0.9
y_star = 3
impulse_size = 1.0
event_dummies = stats.bernoulli(p=p_event).rvs((n_simulation, T))
traj = np.zeros((n_simulation, T))
for t in range(1, T):
X = traj[:, t-1] * rho + event_dummies[:, t] * impulse_size
traj[:, t] = np.clip(X, 0, y_star)
```

Which produces data that looks like this, and I think matches your drawing?

You would then write a PyMC model to estimate p, rho, y_star, and impulse_size. Maybe y_star and impulse_size are known. Riffing on @junpenglao 's comment, you probably want a strongly regularized hierarchical prior on `p`

, so that you strongly pull all the estimates towards 0. This is especially true if you can leverage some information sharing, either between the trajectories, or between the time-steps. I assume the “event matrix” of size `(n_trajectory, T_timesteps)`

will be quite sparse.

You will need to scan to implement this model, which can be a bit complex if you’ve never done it. Also using a Bernoulli likelihood for the event matrix will mean you need to use a compound step, not pure NUTS, which is also a bit of a bummer for speed. You could experiment with other ways to get a sparse matrix of ones and zeros (`at.switch`

? `at.ge(p, 0.5) * 1.0`

?). I also used clip to keep values between 0 and y_star, but you might want to figure out a differentiable way to do that, probably something with a generalized inverse logistic function. I think PyMC doesn’t complain if you include `at.clip`

in your model, but it adds a non-differentiable boundary that the sampler really doesn’t like (that’s also true of the proposed workarounds for the event matrix above).

Anyway, here’ s a really simple model I ran to recover the parameters rho=0.9 and p=0.1. It worked fine, but it was super slow (~20 minutes to sample). Rho was hard to sample, it had low ess, high r-hat, but the estimate was dead on. Consider this a starting point, if anything. To get the distribution number of events, just take the sum of the posterior event matrix across the time dimension. In my case, there was no uncertainty about the number of events, so I don’t have this distribution to show you. If you data is noisier, this might be more interesting.

```
traj_noisy = traj + np.random.normal(scale=0.1, size=traj.shape)
with pm.Model() as model:
p = pm.Beta('p', alpha=1, beta=1)
events = pm.Bernoulli('events', p, size=(n_simulation, T)) * 1.0
rho = pm.Beta('rho', alpha=1, beta=1)
y0 = at.zeros(n_simulation)
mu, _ = aesara.scan(lambda event, y, rho: rho * y + event,
non_sequences=[rho],
sequences=[events.T],
outputs_info=[y0])
y_hat = pm.Normal('y_hat', mu=mu.T, sigma=0.1, observed=traj_noisy)
prior = pm.sample_prior_predictive()
```

Posterior predictive: