Implementation Questions: Thompson Sampling and Switching Models

Your second question is simpler to answer, so I’ll start there. You can see such a switching model implemented in the mixture model docs.

The tricky part with bandits in PyMC3 is incrementally updating your posteriors as you see feedback. I don’t have a good solution to that at the moment. If you want to calculate the arm allocation probabilities using Thompson sampling from a given set of posterior samples, you may be able to use sample_ppc. Basically you would count how often each arm’s sampled posterior reward was the largest.