Inferring how many and when events occur


I’m looking for some high level advice.

I have a system where we take measurements through time, t, and we want to infer (a) how many and (b) when events occurred:

Events cause the measure, y, to increase. In the red trace, two events occurred, in the green trace one occurred. In our data we have many hundreds of traces. We know the relationship of y = f(t, \theta).

It would seem that to set this up as a Bayesian inference problem we’d need to sample (a) how many events occurred (perhaps a Poisson distribution) and (b) continuous values for event times (perhaps a Uniform distribution with support over time samples were taken).

The issue is that the shape of the Uniform distribution is determined by the value sampled from the Poisson; i.e. if two events occurred (red trace) then you’d want to sample two event times, but if only one event occurred you’d want to sample only one.

Does pymc have a suitable mechanism for approaching this?

Thanks in advance

In general this is not straightforward to model in PyMC, given the unknown number of events occurred. A workaround I will suggest is to assume the max number of events is K, and build a latent “deconstruction” matrix with shape (num_observation, K), and use a strongly regularized prior (e.g., horseshoe) on the weights. This is similar to the thinking of generalized additive models (GAM) which a good example application being Prophet | Forecasting at scale..

Here in your application, you will need to generate this latent “deconstruction” matrix inside the model with the latent event occurred time (a Dirichlet or stick-breaking process is good choice of prior here)


To me this looks there are at least the following components:

  1. At each time t, there is a an unknown probability p_t that an event will occur.
  2. There is an autoregressive component, so that y_{t+1} = \rho y_t
  3. There is a (known?) maximum value y^\star, such that y_t = \max(y, y^\star)
  4. The “event impulse size”, the amount the trajectory increases when the event fires.

Here is some code to generate data in this setup:

T = 50
n_simulation = 100

# Parameters
p_event = 0.1
rho = 0.9
y_star = 3
impulse_size = 1.0

event_dummies = stats.bernoulli(p=p_event).rvs((n_simulation, T))
traj = np.zeros((n_simulation, T))
for t in range(1, T):
    X = traj[:, t-1] * rho + event_dummies[:, t] * impulse_size
    traj[:, t] = np.clip(X, 0, y_star)

Which produces data that looks like this, and I think matches your drawing?

You would then write a PyMC model to estimate p, rho, y_star, and impulse_size. Maybe y_star and impulse_size are known. Riffing on @junpenglao 's comment, you probably want a strongly regularized hierarchical prior on p, so that you strongly pull all the estimates towards 0. This is especially true if you can leverage some information sharing, either between the trajectories, or between the time-steps. I assume the “event matrix” of size (n_trajectory, T_timesteps) will be quite sparse.

You will need to scan to implement this model, which can be a bit complex if you’ve never done it. Also using a Bernoulli likelihood for the event matrix will mean you need to use a compound step, not pure NUTS, which is also a bit of a bummer for speed. You could experiment with other ways to get a sparse matrix of ones and zeros (at.switch?, 0.5) * 1.0?). I also used clip to keep values between 0 and y_star, but you might want to figure out a differentiable way to do that, probably something with a generalized inverse logistic function. I think PyMC doesn’t complain if you include at.clip in your model, but it adds a non-differentiable boundary that the sampler really doesn’t like (that’s also true of the proposed workarounds for the event matrix above).

Anyway, here’ s a really simple model I ran to recover the parameters rho=0.9 and p=0.1. It worked fine, but it was super slow (~20 minutes to sample). Rho was hard to sample, it had low ess, high r-hat, but the estimate was dead on. Consider this a starting point, if anything. To get the distribution number of events, just take the sum of the posterior event matrix across the time dimension. In my case, there was no uncertainty about the number of events, so I don’t have this distribution to show you. If you data is noisier, this might be more interesting.

traj_noisy = traj + np.random.normal(scale=0.1, size=traj.shape)
with pm.Model() as model:
    p = pm.Beta('p', alpha=1, beta=1)
    events = pm.Bernoulli('events', p, size=(n_simulation, T)) * 1.0
    rho = pm.Beta('rho', alpha=1, beta=1)
    y0 = at.zeros(n_simulation)
    mu, _ = aesara.scan(lambda event, y, rho: rho * y + event,
    y_hat = pm.Normal('y_hat', mu=mu.T, sigma=0.1, observed=traj_noisy)
    prior = pm.sample_prior_predictive()

Posterior predictive:


I just wanted to thank you both for these detailed suggestions.

It’s tricky to mark either as the concrete ‘solution’ given I was asking for suggested approaches.

But I’ve experimented with both these approaches in my work, and am very grateful for the excellent support.


1 Like