Using Pandas within Pymc3 context?

fgong · June 10, 2019, 5:11pm

Hi Chris,

Thanks for your suggestion! I have decided to change all pandas operations and calculation to numpy but still i faced some problems. Here are the old code:


xxdat['ad'] = (xxdat['pde_norm'].values) * (lam[xxdat['chnl_cd'].values-1]**(xxdat['gap_nrx_promo'].values))
yydat = xxdat.groupby(['spcl_st_cd','nrx_wk', 'chnl_cd', 'nrx_norm'], as_index= False).agg({'ad':test_sum})

and for the first row, the problem is that lam is a symbolic tensor, so even though we have for example 100 rows, after the multiplication it will still result in only one value: Elemwise{mul,no_inplace}.0. But behind the scene it has the shape (100,1). So I decided to do this calculation row by row with below code and also create groupby function using unique:


basic_model = pm.Model()
with basic_model:
    
    ### prior for unknown parameters
    # prior for lambda
    BoundedNormal_lam = pm.Bound(pm.Normal, lower=lam_lbd, upper=lam_ubd)
    lam = BoundedNormal_lam('lam', mu=lam_mean, sd=lam_sd, shape = 8)
    # prior for gamma
    BoundedNormal_gamma = pm.Bound(pm.Normal, lower=0.2, upper=0.8)
    gamma = BoundedNormal_gamma('gamma', mu=0.5, sd=0.1)
    # prior for beta
    BoundedNormal_beta = pm.Bound(pm.Normal, lower=b_lbd, upper=b_ubd)
    beta = BoundedNormal_beta('beta', mu=b_mean, sd=b_sd, shape = 8)
    # prior for v
    v = pm.Uniform('v', lower = 1e-2, upper =1e2)
    

    # decayed promotion: pde * lam_j^{t-l}
    ad = np.apply_along_axis(caclculate_ad, 1, xxdat_matrix)
    xxdat_matrix = np.append(xxdat_matrix,ad.reshape(-1,1),axis = 1)
    
    yydat_matrix = np.empty((len(GroupId1), 5), dtype = 'object')
    
    for index, group in enumerate(GroupId1):
        sub_group = xxdat_matrix[GroupId1 == group]
        sum_ad = tt.sum(sub_group[:,-1].tolist())
        sum_ad_gammma = sum_ad**gamma
        yydat_matrix[index,:] = np.append(sub_group[:,[0,1,2,3]][0].reshape(1,-1), np.array([sum_ad_gammma]).reshape(1,-1), axis = 1)
    
    bx =  np.apply_along_axis(calculate_bx, 1, yydat_matrix)
    mu = tt.mean(bx.tolist())
    
    ### likelihood (sampling distribution) of observations
    Y_obs = pm.Normal('Y_obs', mu=mu, sd = v, observed = yydat_matrix[:,2])

where some pre-calculated staff and functions are as below:


### Transfer pandas data frame into numpy matrix
xxdat_matrix = xxdat.values
### Get the group ID 
GroupId1 = np.unique(xxdat_matrix[:,[0,1,2,3]], axis = 0, return_inverse=True)[1]

def caclculate_ad(x):
    '''x is a numpy array contains all the basic information'''
    pde_norm = x[5]
    channel = int(x[3])-1
    return pde_norm*((lam[channel])**(x[6]))
def calculate_bx(x):
    '''x is a numpy array contains ad_sum_r information'''
    channel =  int(x[3])-1
    ad_sum_r = x[4]
    return beta[channel]*ad_sum_r

Now within pymc3.basic_model context it is working fine. But the issues is, when i try to sample from it, lets say using find_MAP, i got this error:


Exception: ('The following error happened while compiling the node', Elemwise{Composite{(Switch(Cast{int8}((GE(i0, i1) * LE(i0, i2))), (i3 * i4 * (i5 - i4) * (i6 + i7)), i8) + (i9 * (Switch(EQ(i10, i8), i8, (i11 * log(i10) * i12)) + Switch(EQ(i13, i8), i8, (i14 * log(i13) * i15))

It is a very long error and not intuitive at all to me… any idea on this?

Thanks so much for your time!

Best,
Fan

Topic		Replies	Views
Help for Theano calculation Questions	8	1112	June 11, 2019
Is it possible to convert column from pandas to theano.shared? Questions	12	2416	September 24, 2021
Create model matrix Questions	7	1492	June 23, 2021
Combining parameters into single tensor Questions	19	3533	October 30, 2017
Passing pymc3 distribution variable to theano function Questions	3	1636	September 23, 2019

Using Pandas within Pymc3 context?

Related topics