Using Pandas within Pymc3 context?

Hi Chris,

Thank you so much! I finally got some results based on your suggestions. Really learned a lot from you these days!

If there are some further issues i may bother you again :smiley:

Best,
Fan

Hi Chris,

I just faced another issue that maybe only related to theano: One of my groupby task was implemented by the following code:


zzdat_matrix = tt.zeros((len(GroupId3), 4), dtype=T.config.floatX)

for index,group in enumerate(GroupId3):
    sub_group = yydat_matrix[tt.all(tt.eq(yydat_matrix[:,[0,1,2]].astype('float32'), group), axis = 1)]
    bx = tt.sum(beta[(sub_group[:,3].astype('int32'))-1]*sub_group[:,4])
    tt.set_subtensor(zzdat_matrix[index,[0,1,2]], group)  
    tt.set_subtensor(zzdat_matrix[index,3], bx)

But it will run forever without any error. Then I found out that maybe there is something wrong with the sub_group because if i tried sub_group[0], it gave me the error that index is out of bound. But i dont see anything wrong with my boolean indexing here for yydat_matrix.

To have a quick recap, yydat_matrix is a symbolic matrix with 5 columns, GroupId3 is a numpy array with shape 3xn, since tt.eq works element-wise, i need to add a tt.all after that.

Do you have any idea what’s going wrong here? Thanks!

Best,
Fan

Take a look at the documentation for set_subtensor. It does not alter the input variable, but rather returns a copy with the changes made

I have changed it like this:


for index, group in enumerate(GroupId1):
    sub_group = xxdat_matrix[GroupId1 == group]
    sum_ad = tt.sum(sub_group[:,-1].tolist())
    sum_ad_gammma = sum_ad**gamma
    yydat_matrix = tt.set_subtensor(yydat_matrix[index,[0,1,2,3]], group)  
    yydat_matrix = tt.set_subtensor(yydat_matrix[index,4], sum_ad_gammma)

But now it gives me this error:


RecursionError: maximum recursion depth exceeded

I increased the recursion limit number but still cannot get the correct results.

Almost there. How many groups do you have? If it’s quite a few then the recursion limit probably comes from these self-referential set_subtensor calls. You can do some thing like

g_idx, g_vals, gam_vals = list(), list(), list()
for index, group in enumerate(GroupId1):
    sub_group = xxdat_matrix[GroupId1 == group]
    sum_ad = tt.sum(sub_group[:,-1].tolist())
    sum_ad_gammma = sum_ad**gamma
    g_idx.extend(index)
    g_vals.extend([group]*len(index)]
    gam_vals.extend(sum_ad_gamma)

yydat_matrix = tt.set_subtensor(yydat_matrix[g_idx,[0,1,2,3]], g_vals)  
yydat_matrix = tt.set_subtensor(yydat_matrix[g_idx,4], gam-vals)

Now list() likely won’t work. My guess is you’ll need a correctly-shaped matrix or array; but you get the idea. This way you’re only adding two set nodes to the computation graph.

1 Like

Hi Chris,

Thanks! I got your idea and tried two ways:

  1. First used the method you suggested to create a array:

ls_ad_gamma = np.zeros((len(GroupId1), 1), dtype=object)
for index, group in enumerate(GroupId1):
    sub_group = xxdat_matrix[np.all(xxdat_matrix[:,[0,1,2,3]] == group, axis = 1)]
    sum_ad = tt.sum(sub_group[:,-1].tolist())
    ls_ad_gamma[index] = sum_ad**gamma
yydat_matrix = tt.set_subtensor(yydat_matrix[:,4], ls_ad_gamma)    
yydat_matrix = tt.set_subtensor(yydat_matrix[:,[0,1,2,3]], GroupId1)

Problem of this is that, the numpy array ls_ad_gamma cannot be added to the yydat_matrix tensor. It looks like this:


array([[Elemwise{pow,no_inplace}.0],
       [Elemwise{pow,no_inplace}.0],
       [Elemwise{pow,no_inplace}.0],
       ...,
       [Elemwise{pow,no_inplace}.0]], dtype=object)

and got this error:


('Cannot convert [[Elemwise{pow,no_inplace}.0]\n [Elemwise{pow,no_inplace}.0]\n [Elemwise{pow,no_inplace}.0]\n ...\n [Elemwise{pow,no_inplace}.0]\n [Elemwise{pow,no_inplace}.0]\n [Elemwise{pow,no_inplace}.0]] to TensorType', )

change ls_ad_gamma to ls_ad_gamma.tolist() still cannot resolve it. And also if we create a tensor type ls_ad_gamma, then we will do this recursion item assignment again.

  1. Then I tried to simplify my looping:

    yydat_matrix = tt.zeros((len(GroupId1), 5), dtype=T.config.floatX)
    
    for index, group in enumerate(GroupId1):
        sub_group = xxdat_matrix[np.all(xxdat_matrix[:,[0,1,2,3]] == group, axis = 1)]
        sum_ad = tt.sum(sub_group[:,-1].tolist())
        sum_ad_gamma = sum_ad**gamma
        yydat_matrix = tt.set_subtensor(yydat_matrix[index,4], sum_ad_gamma)

    yydat_matrix = tt.set_subtensor(yydat_matrix[:,[0,1,2,3]], GroupId1)

but it still got the recursion limit error. Any idea about this?

Thanks!
Fan