Linear model like this you can usually get a bit of improve in performance using MAP as starting value.
Also, the sampling speed is not necessary fast with small data set - you have 21 predictors in your linear equation, so a bit more data might actually helps the model converge as there is more information from the likelihood.
One more observation is that, your kappa prior is concentrated near zero, while the posterior of kappa is quite large. I think this is the main cause of the divergence. You should change it to something wider.