Thompson sampling example

I also could have just directly used a RandomStream, as in here, yes?

Looking back, I regret putting that bit in the code I presented, because it’s a bit too low-level. Here’s a more obvious implementation, although it’s significantly slower:

class ConjugateAgent:

    def __init__(self, n_actions, initial_alphas=None, initial_betas=None):
        self.n_actions = n_actions

        if initial_alphas is None:
            initial_alphas = np.ones(n_actions)
        elif isinstance(initial_alphas, (int, float)):
            initial_alphas = np.ones(n_actions) * initial_alphas
        if initial_betas is None:
            initial_betas = np.ones(n_actions)
        elif isinstance(initial_betas, (int, float)):
            initial_betas = np.ones(n_actions) * initial_betas
        
        self.alphas = pytensor.shared(initial_alphas, name='alphas')
        self.betas = pytensor.shared(initial_betas, name='betas')
        self.n_pulls = pytensor.shared(np.zeros(n_actions), name='n_pulls')
        
        self.action_distribution = pm.Beta.dist(alpha=self.alpha, beta=self.betas)

    def choose_action(self, n_pulls=1):
        action_probs = pm.draw(self.action_distribution, n_pulls)
        return np.argmax(action_probs, axis=-1).squeeze()

The learn method is unchanged.