12.34 Rethinking Code

I put this issue on the Pymc3 resources github, but maybe this is a better place. In the Pymc3 code for the Rethinking Statistics book, chapter 12 code block 12.34 link the following function is written to predict new cluster values. I am confused by trace['a_actor']*actor_sim. Where does this multiplication come from? I assumed from the book that we should just replace trace['a_actor'] with actor_sim. Can anyone explain why the multiplication makes sense here?

def p_link(prosoc_left, condition, actor_sim, trace):
    Nsim = actor_sim.shape[0]//trace.nchains
    trace = trace[:Nsim]
    logodds = trace['a'] + \
                np.mean(trace['a_actor']*actor_sim, axis=1) + \
                (trace['bp'] + trace['bpC'] * condition) * prosoc_left
    return logistic(logodds)

trace['a_actor'] is the posterior of the coefficient of the linear model. To make prediction you need to do dot multiply of the coefficient with the new simulated data.

There is a coefficient on the intercept?

My confusion is with Code 12.30 where there is no multiplication, the proper column of trace[‘a_actor’] is used alone in the sum (this is where we are predicting outcomes for a cluster that was available at training time).

it is doing indexing in trace['a_actor'][:, actor], you can think of it as doing a multiplication with a matrix of 0 and 1

Sorry for being dense. I get the indexing bit (if we are predicting the 3rd actor, we use the trace of intercepts for that actor (trace[‘a_actor’][:, 3-1]).

The 12.34 code though i think is multiplying the same 7 actor intercepts (i.e. trace[‘a_actor’] is number of samples x 7) and the newly simulated actor intercepts (actor_sim). In the rethinking package, from what I could tell, ‘link’ is simply replacing trace[‘a_actor’] with actor_sim.

Again apology for being dense on this question.

Looking at code written in other packages that replicate rethinking code suggests that the code is not correct. I changed 12.36 to the following and the image is closer to the book, instead of having no variation on the top end (i.e. not 1 across the treatments like it is now)

def p_link(prosoc_left, condition, actor_sim, trace):
    Nsim = actor_sim.shape[0]//trace.nchains
    trace = trace[:Nsim]
    logodds = trace['a'] + \
                np.mean(actor_sim, axis=1) + \
                (trace['bp'] + trace['bpC'] * condition) * prosoc_left
    return logistic(logodds)

pred_raw = np.asarray([p_link(p_l, c_d, a_actor_sims, trace_12_4) 
                       for p_l, c_d in zip(prosoc_left, condition)]).T
pred_p = pred_raw.mean(axis=0)
pred_p_PI = pm.hpd(pred_raw, alpha=.11)

_, ax = plt.subplots(1, 1, figsize=(5, 5))
ax.fill_between(range(4), pred_p_PI[:,1], pred_p_PI[:,0], alpha=0.25, color='k')
ax.plot(pred_p, color='k')

ax.set_ylim(0, 1.1)
ax.set_xlabel("prosoc_left/condition", fontsize=14)
ax.set_ylabel("proportion pulled left", fontsize=14)
plt.xticks(range(4), ("0/0","1/0","0/1","1/1"));

Capture

1 Like

I see. I think you are right - could you please send a pull request?