I am not sure but this is exactly how you would run again into the compile lock, as then each model you have will be built in the initialsiation of pm.sample-depending on how you finally run your parallel code.
I would suggest you initialise your sampler first ergo the model/thenao function is compiled once.
Then you can run the sampling in parallel with sample, but hand over the step (initialised sampler object). This forks the theano graph and then you do the set_value() in the initialisation of the child process. I have done something like this here:
The update_weights there basically does a shared.set_value() after it calculated the new weights.
The init_chain_hypers is then used here when it is parallelized: https://github.com/hvasbath/beat/blob/master/src/models/base.py#L236
Good luck!