Implementation Questions: Thompson Sampling and Switching Models

ChrisDiehl · August 16, 2017, 6:21pm

Hi Folks,

I’m new to PyMC3 and have begun digging into the API documentation to get a handle on model specification. There are two classes of random variables I want to characterize that I’m unsure how to specify at present and wondering if someone can help clarify for me how to make the following assertions.

The first type of random variable I want to characterize is one that arises in Thompson Sampling, a standard Bayesian approach used to address multi-armed bandit problems. In such a setting, we have a probability distribution P(r_i) for the reward from each arm i. At any time t, the next arm chosen to pull at time t is:

k’_t = argmax_i r’_i

where r’_i for all i are realizations drawn from the arm reward distributions. I’m ultimately interested in estimating the multinomial distribution P(k_t) and wondering if there’s a simple way to write a probabilistic program for the above.

Somewhat related to the above are switching models. For the sake of discussion, let’s assume we have a two-armed bandit with reward random variables r_1 and r_2. Now instead of computing the argmax over samples from those random variables, we instead draw a sample s’ from a Bernoulli random variable s and emit r’_1 if s’=0 and r’_2 if s’=1. Is there a mechanism to declare a random variable that results from such a switching process in PyMC3?

Appreciate any clarification here. I’m trying to map the models I have in mind into the PyMC3 framework.

I believe the generalization of the above questions is the following. Can one write a general Python function that yields samples from a random variable and declare that random variable in the specification of a more complex PyMC3 model?

Chris

AustinRochford · August 17, 2017, 12:48pm

Your second question is simpler to answer, so I’ll start there. You can see such a switching model implemented in the mixture model docs.

The tricky part with bandits in PyMC3 is incrementally updating your posteriors as you see feedback. I don’t have a good solution to that at the moment. If you want to calculate the arm allocation probabilities using Thompson sampling from a given set of posterior samples, you may be able to use sample_ppc. Basically you would count how often each arm’s sampled posterior reward was the largest.

ChrisDiehl · August 17, 2017, 2:13pm

Thanks Austin. I’ll take a look at the mixture model class. For my needs, it’s not a big deal to reconstitute the bandit model from scratch as new data comes in.

9f0sdau90 · December 30, 2017, 2:14pm

For incrementally updating posteriors a workaround is detailed here https://stackoverflow.com/questions/40870840/incremental-model-update-pymc3

Topic		Replies	Views
Thompson sampling example version agnostic	7	676	August 23, 2023
Non parametric Bayesian inference Questions	1	526	November 28, 2017
Implementing the random() function for HMM distribution Questions	3	1267	February 8, 2019
Using PyMC for Thompson Sampling version agnostic	4	425	May 4, 2023
Probability estimation in pymc3 Questions	5	772	April 22, 2018

Implementation Questions: Thompson Sampling and Switching Models

Related topics