Random Intercepts and Slopes (Correlated) Convergence Issues

There are quite a few place the Stan model and the pymc3 model differ, but what makes the most differences for model convergence is usually the standardization of the predictor matrix:

transformed data { 
  int Kc = K - 1; 
  matrix[N, K - 1] Xc;  // centered version of X 
  vector[K - 1] means_X;  // column means of X before centering 
  for (i in 2:K) { 
    means_X[i - 1] = mean(X[, i]); 
    Xc[, i - 1] = X[, i] - means_X[i - 1]; 
  } 
} 

You should do the same for your input X in python