Disaggregate and find the change point in time series using PyMC

I have the following problem. I’ve a time series generated under the following scenario. Up to an unknown point T the data is generated using two random variables X1 and X2 with unknown means and standard deviation. After point T the data is generated by just X2. Is there a way to compute the mean / std of X1 and X2 while also finding T ? Here is my code so far with simulated data and it does not seem to be able to recover the original means.

np.random.seed(0)
n = 100
x = np.linspace(0, 10, n)
true_mean1 = 5.0
true_mean2 = 1.0
true_var1 = 0.2
true_var2 = 0.5
true_std1 = np.sqrt(true_var1)
true_std2 = np.sqrt(true_var2)
true_effective_period1 = int(n / 3)
true_effective_period2 = int(2 * n / 3)
true_data = np.concatenate(
[
np.random.normal(
true_mean1 + true_mean2,
np.sqrt(true_var1 + true_var2),
true_effective_period1,
),
np.random.normal(
true_mean2,
np.sqrt(true_var2),
true_effective_period2,
),
]
)
print(true_data.shape)
plt.plot(np.arange(n - 1), true_data)
plt.savefig(“orig.png”)

PyMC3 model

with pm.Model() as model:
# Priors for change point
tau = pm.DiscreteUniform(“tau”, lower=0, upper=n - 1)

# Priors for the means and standard deviations
mean1 = pm.Normal("mean1", mu=4, sd=2)
mean2 = pm.Normal("mean2", mu=4, sd=2)
var1 = pm.HalfNormal("var1", sd=0.5)
var2 = pm.HalfNormal("var2", sd=0.5)

# Define the likelihood. 34 because I've set it to simulate that way.
mu = pm.math.switch(tau > 34, mean1 + mean2, mean2)
sigma = pm.math.switch(tau > 34, var1 + var2, var2)
obs = pm.Normal("obs", mu=mu, sd=sigma, observed=true_data)

# Sample from the posterior
trace = pm.sample(1000, tune=5000, discard_tuned_samples=True, chains=5)
az.plot_trace(trace)
print(az.summary(trace, round_to=2))

plt.savefig(“t.png”)

Everything you’re doing is basically correct, you’re just not doing the changepoint correctly. You want to compare the value of tau to an array of dates, not to the correct date. Here’s a corrected model:

with pm.Model() as model:
    # Priors for change point
    t = pm.MutableData('t', np.arange(n-1))
    tau = pm.DiscreteUniform('tau', lower=0, upper=n - 1)
    
    # Priors for the means and standard deviations
    mu = pm.Normal("mean1", mu=4, sigma=2, size=2)
    sigma = pm.HalfNormal("var1", sigma=0.5, size=2)

    mu_obs = pm.math.switch(t < tau, mu[0], mu[1])
    sigma_obs = pm.math.switch(t < tau, sigma[0], sigma[1])
    obs = pm.Normal("obs", mu=mu_obs, sigma=sigma_obs, observed=true_data)

    # Sample from the posterior
    idata = pm.sample()

This gives the following results:

            mean     sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  r_hat
tau       33.000  0.000  33.000   33.000      0.000    0.000    4000.0    4000.0    NaN
mean1[0]   6.313  0.156   6.021    6.610      0.002    0.001    5555.0    3010.0    1.0
mean1[1]   0.927  0.082   0.781    1.094      0.001    0.001    5172.0    2990.0    1.0
var1[0]    0.888  0.106   0.704    1.086      0.001    0.001    6786.0    3106.0    1.0
var1[1]    0.676  0.058   0.569    0.786      0.001    0.001    5354.0    3570.0    1.0

You can see that all the true parameters are recovered, and the change point is nailed down with zero uncertainty.

Thank you! That fixed the problem. Also I had to change

import pymc3 as pm

to

 import pymc as pm

otherwise it complained that it could not find MutableData .

I used PyMC 5, that’s why the import is different. To be honest I didn’t notice the v3 tag. If you upgraded, make sure you’re not mixing backends (theano-pymc and pytensor). It’s best to make a fresh environment via the suggested installation when you upgrade v3->v5