Harnessing multiple cores to speed up fits with small number of chains

This is an old thread, but I was wondering if there has been any progress on the topic? Is @sham_doran and @junpenglao’s approach still the best way to share tuning information among multiple sampler threads?