I use 1000 tuning/1000 sampling iterations and 4 chains in all 3 systems.
I tried with 2000 tuning samples with pymc3. The results get better but I see some divergences sometimes.
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
w[0] -0.085 3.621 -7.281 6.183 0.122 0.105 993.0 975.0 1.01
w[1] -4.917 3.438 -11.145 1.601 0.113 0.093 1058.0 1062.0 1.00
w[2] -4.177 2.849 -9.466 1.148 0.099 0.070 957.0 851.0 1.01
w[3] 7.440 2.749 2.189 12.611 0.066 0.048 1733.0 2008.0 1.00
w[4] -1.425 2.922 -7.102 3.886 0.063 0.048 2130.0 2612.0 1.00
w[5] 11.256 3.858 4.259 18.376 0.128 0.102 1058.0 998.0 1.00
w[6] 8.562 3.412 1.900 14.690 0.118 0.094 969.0 857.0 1.01
w[7] 0.021 1.867 -3.380 3.591 0.050 0.039 1481.0 1218.0 1.00
w[8] -6.342 1.729 -9.586 -3.130 0.047 0.033 1395.0 1375.0 1.01
w[9] 1.902 1.682 -1.062 5.163 0.038 0.027 1966.0 2294.0 1.00
b 14.815 4.701 6.633 23.998 0.144 0.102 1211.0 1071.0 1.00
w_param_1 6.894 2.078 3.730 10.889 0.053 0.041 1862.0 1738.0 1.00
w_param_0 1.919 0.464 1.163 2.807 0.008 0.006 3397.0 2546.0 1.00