Limiting the number of cores/threads used in PyMC5.6+

Regarding the optimal number of threads. I have seen in other applications (non-PyMC) that using less threads can boost performance. This happens in particular when the tasks are memory intensive; the data transport overhead then cloggs the processor.

It can also help to try various linear algebra libraries.
But np.show_config() is known to not always be accurate: