I’ve been working on an approach to detect and characterize the magnitude of changepoints in a time series. Take for example the following case:

```
x = np.arange(100)
y1 = 1 + x + np.random.randn(100) * 1
y2 = -1 * x + np.random.randn(100) * 2
y3 = 2 * x + np.random.randn(100) * 4
y = np.concatenate((y1[:33], y2[33:66] + 2*y1[33], y3[66:] - 130))
plt.plot(y)
```

My approach consists on modeling a time series using a simple linear regression model where there is an intercept and a slope. However, the latter is modeled as a random walk where there is a Bernoulli process that indicates if at a given point in time there’s a changepoint. The probability of this Bernoulli process is modeled using a Beta distribution biased towards 0 (we are assuming it is more likely there are no changepoints). In case there’s a changepoint, its magnitude is modeled using a Laplace distribution (we use a multilevel approach here to be able to sample new changepoints in the future using `lambda_mu`

and `lambda_b`

). The code for this approach is shown below:

```
with pm.Model() as model:
intercept = pm.Normal("intercept")
slope_init = pm.Normal("slope_init")
prob = pm.Beta("prob", alpha=1, beta=20)
active = pm.Bernoulli("active", prob, shape=99)
lambda_mu = pm.Normal("lambda_mu", 0, 0.01)
lambda_b = pm.HalfNormal("lambda_b", 0.1)
changepoints_magnitude = pm.Laplace(
"changepoints_magnitude",
mu=0, b=1,
shape=99,
)
slope_evolution = pm.Deterministic(
"slope_evolution",
slope_init + aet.cumsum(active * (lambda_mu + changepoints_magnitude * lambda_b), axis=-1),
)
rw_init = aet.zeros(shape=1)
rw_innovations = pm.Normal(
"rw_innovations",
shape=99,
)
rw_raw = pm.Deterministic(
"rw_raw",
aet.cumsum(aet.concatenate([rw_init, slope_evolution + rw_innovations], axis=-1), axis=-1)
)
sd = pm.HalfNormal("sd", 0.01)
slope = pm.Deterministic(
"slope", rw_raw * sd
)
μ = pm.Deterministic("μ",
intercept
+ slope
)
σ = pm.HalfNormal("σ", sigma=0.1)
y_hat = pm.Normal("y", mu=μ, sigma=σ, observed=y, shape=100)
```

Running this model produces the following plot for the probability of a changepoint:

as well as the magnitude for each of them:

The in-sample fit looks like this:

However, I’m observing some divergences in my model training:

```
Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 545 seconds.
INFO:pymc:Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 545 seconds.
There were 10 divergences after tuning. Increase `target_accept` or reparameterize.
ERROR:pymc:There were 10 divergences after tuning. Increase `target_accept` or reparameterize.
There were 34 divergences after tuning. Increase `target_accept` or reparameterize.
ERROR:pymc:There were 34 divergences after tuning. Increase `target_accept` or reparameterize.
There was 1 divergence after tuning. Increase `target_accept` or reparameterize.
ERROR:pymc:There was 1 divergence after tuning. Increase `target_accept` or reparameterize.
```

I’m assuming these are because I’m using both continuous and discrete random variables. I’ve read it is better to marginalize the discrete variables to help with the sampling, however I’m not sure how to achieve this for this model:

```
Multiprocess sampling (4 chains in 4 jobs)
INFO:pymc:Multiprocess sampling (4 chains in 4 jobs)
CompoundStep
INFO:pymc:CompoundStep
>NUTS: [intercept, slope_init, prob, lambda_mu, lambda_b, changepoints_magnitude, rw_innovations, country_shrinkage_pop, σ]
INFO:pymc:>NUTS: [intercept, slope_init, prob, lambda_mu, lambda_b, changepoints_magnitude, rw_innovations, country_shrinkage_pop, σ]
>BinaryGibbsMetropolis: [active]
INFO:pymc:>BinaryGibbsMetropolis: [active]
```

Can anybody help me understand how to correct this model to eliminate the divergences?