Behrens' Bayesian Learner Model : How to share parameters between steps?

I think you are right.

And I understand why PyMC3 (or Stan) cannot implement this model well. I expected those program will draw the whole distribution between trials originally. But the truth is that they only sample some value from the given distribution which PyMC3 cannot update dynamically.