I put this issue on the Pymc3 resources github, but maybe this is a better place. In the Pymc3 code for the Rethinking Statistics book, chapter 12 code block 12.34 link the following function is written to predict new cluster values. I am confused by trace['a_actor']*actor_sim. Where does this multiplication come from? I assumed from the book that we should just replace trace['a_actor'] with actor_sim. Can anyone explain why the multiplication makes sense here?
trace['a_actor'] is the posterior of the coefficient of the linear model. To make prediction you need to do dot multiply of the coefficient with the new simulated data.
My confusion is with Code 12.30 where there is no multiplication, the proper column of trace[‘a_actor’] is used alone in the sum (this is where we are predicting outcomes for a cluster that was available at training time).
Sorry for being dense. I get the indexing bit (if we are predicting the 3rd actor, we use the trace of intercepts for that actor (trace[‘a_actor’][:, 3-1]).
The 12.34 code though i think is multiplying the same 7 actor intercepts (i.e. trace[‘a_actor’] is number of samples x 7) and the newly simulated actor intercepts (actor_sim). In the rethinking package, from what I could tell, ‘link’ is simply replacing trace[‘a_actor’] with actor_sim.
Looking at code written in other packages that replicate rethinking code suggests that the code is not correct. I changed 12.36 to the following and the image is closer to the book, instead of having no variation on the top end (i.e. not 1 across the treatments like it is now)