Best practice for parallel model selection, especially avoidance of recompilation

@Dekermanjian

With Ryzen 5700X, the compilation time was about 10 seconds. But with Ryzen 5500U, it was about 30 seconds. In both cases the sampling times were about 1-2 seconds. So most of the time was spent on compiling model rather than actual sampling.

One more strange thing is that if you change model’s output dimension from (N, ) to (N, 1), i.e. adding one virtual dimension, the compilation time doubled.