To be clear I think your suggestion is a good one, sorry for not mentioning earlier.
Theano not only computes gradients it also perform certain optimizations and runs some calculations in parallel. With the current version of SMC the overhead of setting parallel calculation is low and in general using parallel=true is the best option. According to my tests only for very cheap likelihoods parallel=true is (slightly) slower than parallel=false and thus the default is to use multithreading. I am also exploring the advantages of using dask to run over a cluster.
Btw, if you want to submit a PR go ahead. It will be very welcome, if not I will try your suggestion in a week or so, after I come back from vacations.