Disaggregate and find the change point in time series using PyMC

matrix_testing · May 27, 2023, 6:58pm

I have the following problem. I’ve a time series generated under the following scenario. Up to an unknown point T the data is generated using two random variables X1 and X2 with unknown means and standard deviation. After point T the data is generated by just X2. Is there a way to compute the mean / std of X1 and X2 while also finding T ? Here is my code so far with simulated data and it does not seem to be able to recover the original means.

np.random.seed(0)
n = 100
x = np.linspace(0, 10, n)
true_mean1 = 5.0
true_mean2 = 1.0
true_var1 = 0.2
true_var2 = 0.5
true_std1 = np.sqrt(true_var1)
true_std2 = np.sqrt(true_var2)
true_effective_period1 = int(n / 3)
true_effective_period2 = int(2 * n / 3)
true_data = np.concatenate(
[
np.random.normal(
true_mean1 + true_mean2,
np.sqrt(true_var1 + true_var2),
true_effective_period1,
),
np.random.normal(
true_mean2,
np.sqrt(true_var2),
true_effective_period2,
),
]
)
print(true_data.shape)
plt.plot(np.arange(n - 1), true_data)
plt.savefig(“orig.png”)

PyMC3 model

with pm.Model() as model:
# Priors for change point
tau = pm.DiscreteUniform(“tau”, lower=0, upper=n - 1)

# Priors for the means and standard deviations
mean1 = pm.Normal("mean1", mu=4, sd=2)
mean2 = pm.Normal("mean2", mu=4, sd=2)
var1 = pm.HalfNormal("var1", sd=0.5)
var2 = pm.HalfNormal("var2", sd=0.5)

# Define the likelihood. 34 because I've set it to simulate that way.
mu = pm.math.switch(tau > 34, mean1 + mean2, mean2)
sigma = pm.math.switch(tau > 34, var1 + var2, var2)
obs = pm.Normal("obs", mu=mu, sd=sigma, observed=true_data)

# Sample from the posterior
trace = pm.sample(1000, tune=5000, discard_tuned_samples=True, chains=5)
az.plot_trace(trace)
print(az.summary(trace, round_to=2))

plt.savefig(“t.png”)

jessegrabowski · May 27, 2023, 7:19pm

Everything you’re doing is basically correct, you’re just not doing the changepoint correctly. You want to compare the value of tau to an array of dates, not to the correct date. Here’s a corrected model:

with pm.Model() as model:
    # Priors for change point
    t = pm.MutableData('t', np.arange(n-1))
    tau = pm.DiscreteUniform('tau', lower=0, upper=n - 1)
    
    # Priors for the means and standard deviations
    mu = pm.Normal("mean1", mu=4, sigma=2, size=2)
    sigma = pm.HalfNormal("var1", sigma=0.5, size=2)

    mu_obs = pm.math.switch(t < tau, mu[0], mu[1])
    sigma_obs = pm.math.switch(t < tau, sigma[0], sigma[1])
    obs = pm.Normal("obs", mu=mu_obs, sigma=sigma_obs, observed=true_data)

    # Sample from the posterior
    idata = pm.sample()

This gives the following results:

            mean     sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  r_hat
tau       33.000  0.000  33.000   33.000      0.000    0.000    4000.0    4000.0    NaN
mean1[0]   6.313  0.156   6.021    6.610      0.002    0.001    5555.0    3010.0    1.0
mean1[1]   0.927  0.082   0.781    1.094      0.001    0.001    5172.0    2990.0    1.0
var1[0]    0.888  0.106   0.704    1.086      0.001    0.001    6786.0    3106.0    1.0
var1[1]    0.676  0.058   0.569    0.786      0.001    0.001    5354.0    3570.0    1.0

You can see that all the true parameters are recovered, and the change point is nailed down with zero uncertainty.

matrix_testing · May 30, 2023, 7:09pm

Thank you! That fixed the problem. Also I had to change

import pymc3 as pm

to

 import pymc as pm

otherwise it complained that it could not find MutableData .

jessegrabowski · May 30, 2023, 11:19pm

I used PyMC 5, that’s why the import is different. To be honest I didn’t notice the v3 tag. If you upgraded, make sure you’re not mixing backends (theano-pymc and pytensor). It’s best to make a fresh environment via the suggested installation when you upgrade v3->v5

Topic		Replies	Views
Time series break point detection using PyMC3 - optimization Questions	8	3085	January 17, 2018
Modeling stock market signals (online changepoint detection) Questions	3	1775	March 29, 2020
Pymc3 changepoint model Questions	2	495	September 18, 2018
Generative time series modelling Questions	2	523	August 23, 2018
Multiple Time Series Questions	3	931	April 27, 2021

Disaggregate and find the change point in time series using PyMC

PyMC3 model

Related topics