Changepoints detection and magnitude in a time series

I’ve been working on an approach to detect and characterize the magnitude of changepoints in a time series. Take for example the following case:

x = np.arange(100)
y1 = 1 + x + np.random.randn(100) * 1
y2 = -1 * x + np.random.randn(100) * 2
y3 = 2 * x + np.random.randn(100) * 4

y = np.concatenate((y1[:33], y2[33:66] + 2*y1[33], y3[66:] - 130))

plt.plot(y)

image

My approach consists on modeling a time series using a simple linear regression model where there is an intercept and a slope. However, the latter is modeled as a random walk where there is a Bernoulli process that indicates if at a given point in time there’s a changepoint. The probability of this Bernoulli process is modeled using a Beta distribution biased towards 0 (we are assuming it is more likely there are no changepoints). In case there’s a changepoint, its magnitude is modeled using a Laplace distribution (we use a multilevel approach here to be able to sample new changepoints in the future using lambda_mu and lambda_b). The code for this approach is shown below:

with pm.Model() as model:
    intercept = pm.Normal("intercept")
    slope_init = pm.Normal("slope_init")
    
    prob = pm.Beta("prob", alpha=1, beta=20)
    active = pm.Bernoulli("active", prob, shape=99)
    lambda_mu = pm.Normal("lambda_mu", 0, 0.01)
    lambda_b = pm.HalfNormal("lambda_b", 0.1)
    changepoints_magnitude = pm.Laplace(
        "changepoints_magnitude",
        mu=0, b=1,
        shape=99,
    )
    slope_evolution = pm.Deterministic(
        "slope_evolution",
        slope_init + aet.cumsum(active * (lambda_mu + changepoints_magnitude * lambda_b), axis=-1),
    )
    rw_init = aet.zeros(shape=1)
    rw_innovations = pm.Normal(
        "rw_innovations",
        shape=99,
    )
    rw_raw = pm.Deterministic(
        "rw_raw",
        aet.cumsum(aet.concatenate([rw_init, slope_evolution + rw_innovations], axis=-1), axis=-1)
    )
    sd = pm.HalfNormal("sd", 0.01)
    slope = pm.Deterministic(
        "slope", rw_raw * sd
    )
    
    μ = pm.Deterministic("μ", 
                         intercept 
                         + slope
                        )
    σ = pm.HalfNormal("σ", sigma=0.1)
    
    y_hat = pm.Normal("y", mu=μ, sigma=σ, observed=y, shape=100)

Running this model produces the following plot for the probability of a changepoint:
image

as well as the magnitude for each of them:
image

The in-sample fit looks like this:
image

However, I’m observing some divergences in my model training:

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 545 seconds.
INFO:pymc:Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 545 seconds.
There were 10 divergences after tuning. Increase `target_accept` or reparameterize.
ERROR:pymc:There were 10 divergences after tuning. Increase `target_accept` or reparameterize.
There were 34 divergences after tuning. Increase `target_accept` or reparameterize.
ERROR:pymc:There were 34 divergences after tuning. Increase `target_accept` or reparameterize.
There was 1 divergence after tuning. Increase `target_accept` or reparameterize.
ERROR:pymc:There was 1 divergence after tuning. Increase `target_accept` or reparameterize.

I’m assuming these are because I’m using both continuous and discrete random variables. I’ve read it is better to marginalize the discrete variables to help with the sampling, however I’m not sure how to achieve this for this model:

Multiprocess sampling (4 chains in 4 jobs)
INFO:pymc:Multiprocess sampling (4 chains in 4 jobs)
CompoundStep
INFO:pymc:CompoundStep
>NUTS: [intercept, slope_init, prob, lambda_mu, lambda_b, changepoints_magnitude, rw_innovations, country_shrinkage_pop, σ]
INFO:pymc:>NUTS: [intercept, slope_init, prob, lambda_mu, lambda_b, changepoints_magnitude, rw_innovations, country_shrinkage_pop, σ]
>BinaryGibbsMetropolis: [active]
INFO:pymc:>BinaryGibbsMetropolis: [active]

Can anybody help me understand how to correct this model to eliminate the divergences?

Hi. There are a number of other threads here on change point models which could be useful. Eg.

Hi Benjamin! Thanks for sharing the link. I had a look at @ckrapu’s code and I’ve been playing with some of the ideas proposed there. However, it seems that their implementation requires to propose a maximum number of changepoints M that you would expect and (more importantly) places those proposed changepoints equally-spaced across the data, i.e.

changepoints = pm.Deterministic('changepoints', (2 + at.cumsum(changepoints_raw) * (T-4)))

Not sure if there’s a way to make this more flexible and adapt the location of the changepoints.

One thing I would caution against is treating the observed data points as the only possible locations of your changepoints. That may be ok if you have uniformly distributed observations and they are relatively closely spaced. But if you have unevenly-spaced observations, things could get weird.

Thanks for bringing this up @cluhmann. The time series data I’m working on is evenly-spaced, so this would not be an issue for this case.

I checked some of the codes in the shared link and came up with the following piece of code:

max_cp_inference = 10
tiled_times = np.arange(T)[:, None].repeat(max_cp_inference, axis=1)

with pm.Model() as model:
    intercept = pm.Normal("intercept")
    slope_init = pm.Normal("slope_init")
    
    n_cps = pm.DiscreteUniform("n_cps", 0, max_cp_inference)
    cp_times = pm.DiscreteUniform("cp_times", 1, T-1, shape=max_cp_inference)
    cp_times_sorted = cp_times.sort()
    is_timestep_past_cp = (tiled_times >= cp_times_sorted[None, :].repeat(T, axis=0))
    # We need to disable transforms so that we can use the posterior points directly
    # As the transformed draws are not saved
    cp_sd = pm.HalfNormal('cp_sd', sigma=0.01, transform=None)
    cp_deltas = pm.Normal('cp_deltas', sigma=cp_sd, shape=max_cp_inference)
    cp_contrib = aet.sum(cp_deltas[:n_cps] * is_timestep_past_cp[:, :n_cps], axis=1)
    slope = pm.Deterministic(
        "slope",
        aet.cumsum(slope_init + cp_contrib, axis=-1)
    )
    
    μ = pm.Deterministic("μ", 
                         intercept 
                         + slope
                        )
    σ = pm.HalfNormal("σ", sigma=0.1)
    
    y_hat = pm.Normal("y", mu=μ, sigma=σ, observed=y, shape=100)

I get no divergences, but several warnings instead:

/home/ec2-user/anaconda3/envs/pymc_p310/lib/python3.10/site-packages/pymc/step_methods/metropolis.py:296: RuntimeWarning: overflow encountered in exp
  "accept": np.mean(np.exp(self.accept_rate_iter)),
/home/ec2-user/anaconda3/envs/pymc_p310/lib/python3.10/site-packages/pymc/step_methods/metropolis.py:296: RuntimeWarning: overflow encountered in exp
  "accept": np.mean(np.exp(self.accept_rate_iter)),
/home/ec2-user/anaconda3/envs/pymc_p310/lib/python3.10/site-packages/pymc/step_methods/metropolis.py:296: RuntimeWarning: overflow encountered in exp
  "accept": np.mean(np.exp(self.accept_rate_iter)),
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 64 seconds.
INFO:pymc:Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 64 seconds.
The acceptance probability does not match the target. It is 0.8799, but should be close to 0.8. Try to increase the number of tuning steps.
WARNING:pymc:The acceptance probability does not match the target. It is 0.8799, but should be close to 0.8. Try to increase the number of tuning steps

However, I noticed that the n_cps RV tends to be equal to the maximum number of changepoints allowed max_cp_inference . In addition, the disadvantage of this model is that I’m not able to get the probability of a changepoint and the probability distribution for the magnitude as in the original implementation: