Implementation Questions: Thompson Sampling and Switching Models

Hi Folks,

I’m new to PyMC3 and have begun digging into the API documentation to get a handle on model specification. There are two classes of random variables I want to characterize that I’m unsure how to specify at present and wondering if someone can help clarify for me how to make the following assertions.

The first type of random variable I want to characterize is one that arises in Thompson Sampling, a standard Bayesian approach used to address multi-armed bandit problems. In such a setting, we have a probability distribution P(r_i) for the reward from each arm i. At any time t, the next arm chosen to pull at time t is:

k’_t = argmax_i r’_i

where r’_i for all i are realizations drawn from the arm reward distributions. I’m ultimately interested in estimating the multinomial distribution P(k_t) and wondering if there’s a simple way to write a probabilistic program for the above.

Somewhat related to the above are switching models. For the sake of discussion, let’s assume we have a two-armed bandit with reward random variables r_1 and r_2. Now instead of computing the argmax over samples from those random variables, we instead draw a sample s’ from a Bernoulli random variable s and emit r’_1 if s’=0 and r’_2 if s’=1. Is there a mechanism to declare a random variable that results from such a switching process in PyMC3?

Appreciate any clarification here. I’m trying to map the models I have in mind into the PyMC3 framework.

I believe the generalization of the above questions is the following. Can one write a general Python function that yields samples from a random variable and declare that random variable in the specification of a more complex PyMC3 model?



Your second question is simpler to answer, so I’ll start there. You can see such a switching model implemented in the mixture model docs.

The tricky part with bandits in PyMC3 is incrementally updating your posteriors as you see feedback. I don’t have a good solution to that at the moment. If you want to calculate the arm allocation probabilities using Thompson sampling from a given set of posterior samples, you may be able to use sample_ppc. Basically you would count how often each arm’s sampled posterior reward was the largest.

Thanks Austin. I’ll take a look at the mixture model class. For my needs, it’s not a big deal to reconstitute the bandit model from scratch as new data comes in.

For incrementally updating posteriors a workaround is detailed here