Theano.scan in Hierarchical models (Bayesian Reinforcement Learning)

I want to apply theano.scan over multiple parameters in a hierarchical model.

The code works great for a single model, however, I’m not sure how to convert the model to a hierarchical model (requiring additional looping).

Here is the code for the single model:


  with pm.Model() as bayesian_reinforcement_learning:

    # data
    actions_  = pm.Data('actions', actions)
    rewards_  = pm.Data('rewards', rewards)

    # priors
    alpha = pm.Beta('alpha', 1, 1)
    beta  = pm.HalfNormal('beta', 10)

    # model 
    # init Qs
    Qs = 0.3333 * tt.ones(3, dtype='float64')

    # Compute the Q values for each trial
    Qs, _ = theano.scan(
        fn=lambda action, reward, Qs, alpha: tt.set_subtensor(Qs[action], Qs[action] + alpha * (reward - Qs[action])),
        sequences=[actions_, rewards_],
        outputs_info=[Qs],
        non_sequences=[alpha])


    BQs = beta*Qs
    pi =  tt.nnet.softmax(BQs)
    like = pm.Categorical('like', p=pi, observed=actions_)

    # opt
    trace = pm.sample(tune=1000, chains=2, target_accept=0.85) # tune=5000, target_accept=0.9 njobs=4 ,target_accept=0.99
    idata = az.from_pymc3(trace)
    pm.traceplot(idata)

& here is the structure for the hierarchical model:



with pm.Model(coords=coords) as hierarchical_bayes_RL:

  # Data
  subject_idx = pm.Data("subject_idx", subject_idxs, dims="sub_id")

  # data
  actions_  = pm.Data('actions', df.actions.values)
  rewards_  = pm.Data('rewards', df.rewards.values)


  # Hyperpriors
  a_alpha = pm.HalfNormal("a_alpha", 5.0)
  b_alpha = pm.HalfNormal("b_alpha", 5.0)
  mu_b    = pm.HalfNormal('mu_b',   10.0)
  sigma_b = pm.HalfNormal("sigma_b", 5.0)
 

  # Intercept for each county, distributed around group mean mu_a
  # Above we just set mu and sd to a fixed value while here we
  # plug in a common group distribution for all a and b (which are
  # vectors of length n_counties).

  # parameters
  alpha = pm.Beta('alpha', a_alpha,  b_alpha, dims="subject")
  beta = pm.Normal("beta", mu=mu_b, sigma=sigma_b, dims="subject")


  # model 
  # init Qs

# 3 Qs values per parameter are required...?
  Qs = 0.3333 * tt.ones((3, 100), dtype='float64')


  # Compute the Q values for each trial
  # Qs, _ = theano.scan(
  #     # fn=lambda action, reward, Qs, alpha: Qs[action],
  #     fn = lambda action, reward, Qs, alpha: tt.set_subtensor(Qs[action], Qs[action] + alpha * (reward - Qs[action])),
  #     sequences=[actions_[subject_idx], rewards_[subject_idx]],
  #     outputs_info=[Qs],
  #     non_sequences=[alpha]
  # )
  


  BQs = beta[subject_idx]  * Qs[:, subject_idx]
  pi =  tt.nnet.softmax(BQs)
  like = pm.Categorical('P[A]', p=pi, observed=actions_, dims='sub_id')


pm.model_to_graphviz(hierarchical_bayes_RL)

If you have a better solution that is completely vectorized that would be awesome too! Anything that works, I’m new to theano… Thanks!

hello! Theano isn’t being actively developed anymore, firstly I’d suggest trying Aesara, it’s a fork of theano with theano’s functionalities and some more!! It’s maintained by a group of developers some of which are PyMC developers as well. Additionally about hierarchical models for best practices in PyMC consider having a good look at this:

For a hacky model with aesara with some hierarchical modelling you can look at this talk I gave here are the slides and notebook GitHub - mjhajharia/pydata21: Pydata talk - Football Analytics Using Hierarchical Bayesian Models in PyMC

In case you’re interested in understanding hierarchical models in general I’ve found this stan blog post quite nice Hierarchical Partial Pooling for Repeated Binary Trials

Note: Although most things in Stan, PyMC or other PPLs can be easily converted to each other sometimes there can be issues so you can avoid the last post if it makes things more confusing.

Hope you find some ideas here!