Thanks for the suggestion, I didn’t really need k to be as large as 500, for my purposes having k as 10 would be enough.
In [62]: x0=.5
...: k=10
...: t=np.linspace(0., 1., 2500)
...: np.exp(-k*(t-x0))
...:
Out[62]:
array([ 1.48413159e+02, 1.47820456e+02, 1.47230119e+02, ...,
6.79208851e-03, 6.76496359e-03, 6.73794700e-03])
The range of the exponential is much smaller now, but the mass matrix issue persists 
The model again:
In [69]: def logistic(L, x0, k=10, t=np.linspace(0., 1., 2500)):
...: return L/(1+tt.exp(-k*(t-x0)))
...:
In [70]: with pm.Model() as model_before_700:
...: alpha = pm.Normal("alpha", mu = 0, sd = 3.0, shape = 2)
...: beta = pm.Normal("beta", mu = 0, sd = 1.0, shape = 2)
...: switchpoints = pm.Beta("switchpoints", alpha =1, beta=1)
...: sd = pm.HalfCauchy("sd", 0.5, shape = 2)
...: intercept = alpha[0] + logistic(alpha[1], switchpoints)
...: slope = beta[0] + logistic(beta[1], switchpoints)
...: dev = sd[0] + logistic(sd[1], switchpoints)
...:
In [71]: with model_before_700:
...: regression = intercept + slope*np.arange(2500)
...: observed = pm.Normal("observed", mu = regression, sd = dev, observed = data_before_700)
...:
In [73]: with model_before_700:
...: trace_before_700 = pm.sample(tune = 2000, draws = 1000, njobs = 3)
...:
I don’t know if this is helpful but: the sampler does sample quite a few iterations before halting with that mass matrix exception. The sampling is pretty slow (slower than using NUTS + Metropolis while using the DiscreteUniform switchpoint model) while it happens.