# Disaggregate and find the change point in time series using PyMC

I have the following problem. I’ve a time series generated under the following scenario. Up to an unknown point `T` the data is generated using two random variables X1 and X2 with unknown means and standard deviation. After point `T` the data is generated by just X2. Is there a way to compute the mean / std of X1 and X2 while also finding `T` ? Here is my code so far with simulated data and it does not seem to be able to recover the original means.

np.random.seed(0)
n = 100
x = np.linspace(0, 10, n)
true_mean1 = 5.0
true_mean2 = 1.0
true_var1 = 0.2
true_var2 = 0.5
true_std1 = np.sqrt(true_var1)
true_std2 = np.sqrt(true_var2)
true_effective_period1 = int(n / 3)
true_effective_period2 = int(2 * n / 3)
true_data = np.concatenate(
[
np.random.normal(
true_mean1 + true_mean2,
np.sqrt(true_var1 + true_var2),
true_effective_period1,
),
np.random.normal(
true_mean2,
np.sqrt(true_var2),
true_effective_period2,
),
]
)
print(true_data.shape)
plt.plot(np.arange(n - 1), true_data)
plt.savefig(“orig.png”)

# PyMC3 model

with pm.Model() as model:
# Priors for change point
tau = pm.DiscreteUniform(“tau”, lower=0, upper=n - 1)

``````# Priors for the means and standard deviations
mean1 = pm.Normal("mean1", mu=4, sd=2)
mean2 = pm.Normal("mean2", mu=4, sd=2)
var1 = pm.HalfNormal("var1", sd=0.5)
var2 = pm.HalfNormal("var2", sd=0.5)

# Define the likelihood. 34 because I've set it to simulate that way.
mu = pm.math.switch(tau > 34, mean1 + mean2, mean2)
sigma = pm.math.switch(tau > 34, var1 + var2, var2)
obs = pm.Normal("obs", mu=mu, sd=sigma, observed=true_data)

# Sample from the posterior
trace = pm.sample(1000, tune=5000, discard_tuned_samples=True, chains=5)
az.plot_trace(trace)
print(az.summary(trace, round_to=2))
``````

plt.savefig(“t.png”)

Everything you’re doing is basically correct, you’re just not doing the changepoint correctly. You want to compare the value of `tau` to an array of dates, not to the correct date. Here’s a corrected model:

``````with pm.Model() as model:
# Priors for change point
t = pm.MutableData('t', np.arange(n-1))
tau = pm.DiscreteUniform('tau', lower=0, upper=n - 1)

# Priors for the means and standard deviations
mu = pm.Normal("mean1", mu=4, sigma=2, size=2)
sigma = pm.HalfNormal("var1", sigma=0.5, size=2)

mu_obs = pm.math.switch(t < tau, mu[0], mu[1])
sigma_obs = pm.math.switch(t < tau, sigma[0], sigma[1])
obs = pm.Normal("obs", mu=mu_obs, sigma=sigma_obs, observed=true_data)

# Sample from the posterior
idata = pm.sample()
``````

This gives the following results:

``````            mean     sd  hdi_3%  hdi_97%  mcse_mean  mcse_sd  ess_bulk  ess_tail  r_hat
tau       33.000  0.000  33.000   33.000      0.000    0.000    4000.0    4000.0    NaN
mean1[0]   6.313  0.156   6.021    6.610      0.002    0.001    5555.0    3010.0    1.0
mean1[1]   0.927  0.082   0.781    1.094      0.001    0.001    5172.0    2990.0    1.0
var1[0]    0.888  0.106   0.704    1.086      0.001    0.001    6786.0    3106.0    1.0
var1[1]    0.676  0.058   0.569    0.786      0.001    0.001    5354.0    3570.0    1.0
``````

You can see that all the true parameters are recovered, and the change point is nailed down with zero uncertainty.

Thank you! That fixed the problem. Also I had to change

``````import pymc3 as pm
``````

to

`````` import pymc as pm
``````

otherwise it complained that it could not find `MutableData` .

I used PyMC 5, that’s why the import is different. To be honest I didn’t notice the v3 tag. If you upgraded, make sure you’re not mixing backends (theano-pymc and pytensor). It’s best to make a fresh environment via the suggested installation when you upgrade v3->v5