Hi everyone. I’ve been having trouble understanding how pymc handles indexing. Is there a document somewhere that explains how pymc manages shapes and indexes?
Let me give an example of the kind of problem I’m having. Suppose we want to fit two logistic regressions and do partial pooling between their slopes. I have one predictor (time) and two outcome variables (k). I’ve stored the outcomes in a 2x22 array, along with the n’s in another 2x22 array. Each row represents one cluster and each column represents one point in time. Here’s a data generating process:
# synthetic data
# one predictor variable
# representing time
x = np.arange(22)
# n is a 2x22 matrix representing
# the maximiums at each point in time
n = np.random.randint(100,size=(2,22))
# apply inverse logit transformation on
# x twice with two different paramters
a0 = -4
b0 = 0.1
p0 = np.exp(a0 + b0*x) / (1 + np.exp(a0 + b0*x))
a1 = -2
b1 = 0.08
p1 = np.exp(a1 + b1*x) / (1 + np.exp(a1 + b1*x))
# generate outcome variable,
# two count variables from
# binomial distribution
k0 = stats.binom(n=n[0],p=p0).rvs()
k1 = stats.binom(n=n[1],p=p1).rvs()
# combine to get 2 x 22 outcome array
k = np.array([k0,k1])
So this pymc code works. It’s ugly because it just enumerates both logistic regressions by using hard-coded indexes on the parameters and then another batch of hard-coded indexes to pull out the right data.
with pm.Model() as partial_pool:
hyper_b_mu = pm.Normal('hbmu',0,1)
hyper_b_std = pm.HalfNormal('hbstd',0.5)
a = pm.Normal('a',0,1,shape=2)
b = pm.Normal('b',hyper_b_mu,hyper_b_std,shape=2)
p0 = pm.invlogit(a[0]+b[0]*x)
p1 = pm.invlogit(a[1]+b[1]*x)
y0 = pm.Binomial('y0',n=n[0],p=p0,observed=k[0])
y1 = pm.Binomial('y1',n=n[1],p=p1,observed=k[1])
trace_partial_pool = pm.sample()
I thought that I could just replace my hard-coded subscripts with an array, called cluster below, that takes on the same values (0,1). But this pymc code doesn’t work - aesara throws up “AssertionError: Could not broadcast dimensions” when it tried to compile the model. I’ve tried what feels like a dozen permutations of this indexed pymc code and all of them give me the same error.
# added cluster to manage indexing
cluster = np.array([0,1])
with pm.Model() as partial_pool:
hyper_b_mu = pm.Normal('hbmu',0,1)
hyper_b_std = pm.HalfNormal('hbstd',0.5)
a = pm.Normal('a',0,1,shape=2)
b = pm.Normal('b',hyper_b_mu,hyper_b_std,shape=2)
p = pm.invlogit(a[cluster]+b[cluster]*x)
y = pm.Binomial('y',n=n[cluster],p=p[cluster],observed=k[cluster])
trace_partial_pool = pm.sample()
So anyway, it would be wonderful if someone could explain why the first one works but the second doesn’t when, to my mind, they are the same thing.