More informative prior usually help, for example theta = pm.Normal('theta', mu=0, sd=1, shape=num_features)
Going a bit more in depth, you data contains a lot of columns (2100 training examples with 600 features each) - a sparse regression with horseshoe prior would be more appropriate.