I also could have just directly used a RandomStream, as in here, yes?
Looking back, I regret putting that bit in the code I presented, because it’s a bit too low-level. Here’s a more obvious implementation, although it’s significantly slower:
class ConjugateAgent:
def __init__(self, n_actions, initial_alphas=None, initial_betas=None):
self.n_actions = n_actions
if initial_alphas is None:
initial_alphas = np.ones(n_actions)
elif isinstance(initial_alphas, (int, float)):
initial_alphas = np.ones(n_actions) * initial_alphas
if initial_betas is None:
initial_betas = np.ones(n_actions)
elif isinstance(initial_betas, (int, float)):
initial_betas = np.ones(n_actions) * initial_betas
self.alphas = pytensor.shared(initial_alphas, name='alphas')
self.betas = pytensor.shared(initial_betas, name='betas')
self.n_pulls = pytensor.shared(np.zeros(n_actions), name='n_pulls')
self.action_distribution = pm.Beta.dist(alpha=self.alpha, beta=self.betas)
def choose_action(self, n_pulls=1):
action_probs = pm.draw(self.action_distribution, n_pulls)
return np.argmax(action_probs, axis=-1).squeeze()
The learn method is unchanged.