Okay, I tried on my laptop with the max set to 2 and then with it set to 1. In both cases, all 16 threads of my laptop are used at 100%.
Just to be clear, this does not happen when my data size is small (1000) but does when it is larger (10000). With smaller samples, four threads are used (one for each chain).