There are tiny, tiny issues. Most fundamentally, because switchpoint
is discrete, MH is used to sample it. This requires a large amount of tuning to explore the space. When you don’t use enough you get what you observed: MH hasn’t spent enough time bouncing around to find the optimal.
For MH i tend to use:
burn_in=10000
tune=5000
with pm.Model() as model:
...
trace = pm.sample(1000, tune=tune + burn_in, chains=8, cores=3)
The second (tiny tiny) issue is with
model = pm.math.switch(switchpoint >= X, part1, part2)
This should read
model = pm.math.switch(tt.gt(switchpoint, X), part1, part2)
The third issue is with
std = pm.HalfNormal('std', sd=5)
This puts a lot of weight on error variances that exceed the range of the data. Using a more informative prior like
std = pm.HalfNormal('std', sd=0.5)
makes a huge difference in both cases.
With these three changes model 1 appears to work just fine:
and with confidence region:
y_mean = np.mean(ppc['y_lik'], 0)
y_u95 = np.percentile(ppc['y_lik'], 97.5, axis=0)
y_u05 = np.percentile(ppc['y_lik'], 2.5, axis=0)
plt.plot(X, y_mean)
plt.plot(X, y_u95, color='red')
plt.plot(X, y_u05, color='red')
Edit:
Oh and for model 2, I think the sigmoid weights are reversed:
w = tt.nnet.sigmoid(2*(X-switchpoint))
model = (1-w)*part1 + w*part2
model = w*part1 + (1-w)*part2