Changepoints detection and magnitude in a time series

jroberayalas · December 5, 2022, 4:23am

I’ve been working on an approach to detect and characterize the magnitude of changepoints in a time series. Take for example the following case:

x = np.arange(100)
y1 = 1 + x + np.random.randn(100) * 1
y2 = -1 * x + np.random.randn(100) * 2
y3 = 2 * x + np.random.randn(100) * 4

y = np.concatenate((y1[:33], y2[33:66] + 2*y1[33], y3[66:] - 130))

plt.plot(y)

My approach consists on modeling a time series using a simple linear regression model where there is an intercept and a slope. However, the latter is modeled as a random walk where there is a Bernoulli process that indicates if at a given point in time there’s a changepoint. The probability of this Bernoulli process is modeled using a Beta distribution biased towards 0 (we are assuming it is more likely there are no changepoints). In case there’s a changepoint, its magnitude is modeled using a Laplace distribution (we use a multilevel approach here to be able to sample new changepoints in the future using lambda_mu and lambda_b). The code for this approach is shown below:

with pm.Model() as model:
    intercept = pm.Normal("intercept")
    slope_init = pm.Normal("slope_init")
    
    prob = pm.Beta("prob", alpha=1, beta=20)
    active = pm.Bernoulli("active", prob, shape=99)
    lambda_mu = pm.Normal("lambda_mu", 0, 0.01)
    lambda_b = pm.HalfNormal("lambda_b", 0.1)
    changepoints_magnitude = pm.Laplace(
        "changepoints_magnitude",
        mu=0, b=1,
        shape=99,
    )
    slope_evolution = pm.Deterministic(
        "slope_evolution",
        slope_init + aet.cumsum(active * (lambda_mu + changepoints_magnitude * lambda_b), axis=-1),
    )
    rw_init = aet.zeros(shape=1)
    rw_innovations = pm.Normal(
        "rw_innovations",
        shape=99,
    )
    rw_raw = pm.Deterministic(
        "rw_raw",
        aet.cumsum(aet.concatenate([rw_init, slope_evolution + rw_innovations], axis=-1), axis=-1)
    )
    sd = pm.HalfNormal("sd", 0.01)
    slope = pm.Deterministic(
        "slope", rw_raw * sd
    )
    
    μ = pm.Deterministic("μ", 
                         intercept 
                         + slope
                        )
    σ = pm.HalfNormal("σ", sigma=0.1)
    
    y_hat = pm.Normal("y", mu=μ, sigma=σ, observed=y, shape=100)

Running this model produces the following plot for the probability of a changepoint:

as well as the magnitude for each of them:

The in-sample fit looks like this:

However, I’m observing some divergences in my model training:

Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 545 seconds.
INFO:pymc:Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 545 seconds.
There were 10 divergences after tuning. Increase `target_accept` or reparameterize.
ERROR:pymc:There were 10 divergences after tuning. Increase `target_accept` or reparameterize.
There were 34 divergences after tuning. Increase `target_accept` or reparameterize.
ERROR:pymc:There were 34 divergences after tuning. Increase `target_accept` or reparameterize.
There was 1 divergence after tuning. Increase `target_accept` or reparameterize.
ERROR:pymc:There was 1 divergence after tuning. Increase `target_accept` or reparameterize.

I’m assuming these are because I’m using both continuous and discrete random variables. I’ve read it is better to marginalize the discrete variables to help with the sampling, however I’m not sure how to achieve this for this model:

Multiprocess sampling (4 chains in 4 jobs)
INFO:pymc:Multiprocess sampling (4 chains in 4 jobs)
CompoundStep
INFO:pymc:CompoundStep
>NUTS: [intercept, slope_init, prob, lambda_mu, lambda_b, changepoints_magnitude, rw_innovations, country_shrinkage_pop, σ]
INFO:pymc:>NUTS: [intercept, slope_init, prob, lambda_mu, lambda_b, changepoints_magnitude, rw_innovations, country_shrinkage_pop, σ]
>BinaryGibbsMetropolis: [active]
INFO:pymc:>BinaryGibbsMetropolis: [active]

Can anybody help me understand how to correct this model to eliminate the divergences?

drbenvincent · December 6, 2022, 9:44am

Hi. There are a number of other threads here on change point models which could be useful. Eg.

jroberayalas · December 6, 2022, 7:42pm

Hi Benjamin! Thanks for sharing the link. I had a look at @ckrapu’s code and I’ve been playing with some of the ideas proposed there. However, it seems that their implementation requires to propose a maximum number of changepoints M that you would expect and (more importantly) places those proposed changepoints equally-spaced across the data, i.e.

changepoints = pm.Deterministic('changepoints', (2 + at.cumsum(changepoints_raw) * (T-4)))

Not sure if there’s a way to make this more flexible and adapt the location of the changepoints.

cluhmann · December 6, 2022, 8:56pm

One thing I would caution against is treating the observed data points as the only possible locations of your changepoints. That may be ok if you have uniformly distributed observations and they are relatively closely spaced. But if you have unevenly-spaced observations, things could get weird.

jroberayalas · December 6, 2022, 9:00pm

Thanks for bringing this up @cluhmann. The time series data I’m working on is evenly-spaced, so this would not be an issue for this case.

jroberayalas · December 7, 2022, 12:54am

I checked some of the codes in the shared link and came up with the following piece of code:

max_cp_inference = 10
tiled_times = np.arange(T)[:, None].repeat(max_cp_inference, axis=1)

with pm.Model() as model:
    intercept = pm.Normal("intercept")
    slope_init = pm.Normal("slope_init")
    
    n_cps = pm.DiscreteUniform("n_cps", 0, max_cp_inference)
    cp_times = pm.DiscreteUniform("cp_times", 1, T-1, shape=max_cp_inference)
    cp_times_sorted = cp_times.sort()
    is_timestep_past_cp = (tiled_times >= cp_times_sorted[None, :].repeat(T, axis=0))
    # We need to disable transforms so that we can use the posterior points directly
    # As the transformed draws are not saved
    cp_sd = pm.HalfNormal('cp_sd', sigma=0.01, transform=None)
    cp_deltas = pm.Normal('cp_deltas', sigma=cp_sd, shape=max_cp_inference)
    cp_contrib = aet.sum(cp_deltas[:n_cps] * is_timestep_past_cp[:, :n_cps], axis=1)
    slope = pm.Deterministic(
        "slope",
        aet.cumsum(slope_init + cp_contrib, axis=-1)
    )
    
    μ = pm.Deterministic("μ", 
                         intercept 
                         + slope
                        )
    σ = pm.HalfNormal("σ", sigma=0.1)
    
    y_hat = pm.Normal("y", mu=μ, sigma=σ, observed=y, shape=100)

I get no divergences, but several warnings instead:

/home/ec2-user/anaconda3/envs/pymc_p310/lib/python3.10/site-packages/pymc/step_methods/metropolis.py:296: RuntimeWarning: overflow encountered in exp
  "accept": np.mean(np.exp(self.accept_rate_iter)),
/home/ec2-user/anaconda3/envs/pymc_p310/lib/python3.10/site-packages/pymc/step_methods/metropolis.py:296: RuntimeWarning: overflow encountered in exp
  "accept": np.mean(np.exp(self.accept_rate_iter)),
/home/ec2-user/anaconda3/envs/pymc_p310/lib/python3.10/site-packages/pymc/step_methods/metropolis.py:296: RuntimeWarning: overflow encountered in exp
  "accept": np.mean(np.exp(self.accept_rate_iter)),
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 64 seconds.
INFO:pymc:Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 64 seconds.
The acceptance probability does not match the target. It is 0.8799, but should be close to 0.8. Try to increase the number of tuning steps.
WARNING:pymc:The acceptance probability does not match the target. It is 0.8799, but should be close to 0.8. Try to increase the number of tuning steps

However, I noticed that the n_cps RV tends to be equal to the maximum number of changepoints allowed max_cp_inference . In addition, the disadvantage of this model is that I’m not able to get the probability of a changepoint and the probability distribution for the magnitude as in the original implementation:

jroberayalas:

prob = pm.Beta("prob", alpha=1, beta=20)
    active = pm.Bernoulli("active", prob, shape=99)
    lambda_mu = pm.Normal("lambda_mu", 0, 0.01)
    lambda_b = pm.HalfNormal("lambda_b", 0.1)

Topic		Replies	Views
Hierarchical changepoint detection v5	14	2064	June 13, 2024
Modeling stock market signals (online changepoint detection) Questions	3	1834	March 29, 2020
Pymc3 changepoint model Questions	2	502	September 18, 2018
Converting Model from Stan Questions	15	3130	June 27, 2017
Disaggregate and find the change point in time series using PyMC v3 modeling	3	405	May 30, 2023

Changepoints detection and magnitude in a time series

Related topics