Try using the type `theano.config.floatX`

?

# Using Pandas within Pymc3 context?

**Thanks Chartl. But it Still didn’t work. Error message:**

KeyError Traceback (most recent call last)

/anaconda3/lib/python3.7/site-packages/theano/tensor/type.py in dtype_specs(self)

268 ‘complex64’: (complex, ‘theano_complex64’, ‘NPY_COMPLEX64’)

–> 269 }[self.dtype]

270 except KeyError:

KeyError: ‘object’

During handling of the above exception, another exception occurred:

TypeError: Unsupported dtype for TensorType: object

**Let’s just focus on one line of the code:**

`xxdat['ad'] = ((xxdat['pde_norm'].values) * (lam[xxdat['chnl_cd'].values-1]**(xxdat['gap_nrx_promo'].values))).astype(T.config.floatX)`

**I add a print function after that, and the got this table result:**

<class ‘pandas.core.frame.DataFrame’>

Int64Index: 1000 entries, 1029454 to 607395

Data columns (total 8 columns):

spcl_st_cd 1000 non-null int64

nrx_wk 1000 non-null int64

nrx_norm 1000 non-null float64

chnl_cd 1000 non-null int64

promo_wk 1000 non-null int64

pde_norm 1000 non-null float64

gap_nrx_promo 1000 non-null int64

ad 1000 non-null object

dtypes: float64(2), int64(5), object(1)

memory usage: 70.3+ KB

None

**So even we change the data type, the generated column is still object type. Do you know the reason for this?**

You can always try to unwrap into an array

```
xxdat_ad = np.array([float(u) for u in xxdat['ad']])
```

and see where the error occurs.

By further checking the column it generated, is looks like this: That is the reason it is the object type

You’re mixing theano and numpy. `ad_sum_r`

is a random variable, and so will all downstream computations that involve it. These computations should be performed in theano to register them on the DAG, rather than through numpy/pandas which won’t know what to do with a theano variable.

I think you are right. I never used theano before so it is a bit confusing here for me.

In a nutshell, what I want to do here is to generate a column based on the random variable `lam`

, and use this generated column to further do some aggregations, the final column generated which is `bx`

will be directly used for calculating the likelihood.

In that case, how can i separate pandas/numpy calculations from theano?

In this case it would mean manually grouping and aggregating, so that you can use `theano.tensor.sum`

instead of `np.sum`

. and `theano.tensor.mean`

instead of `np.mean`

. I don’t think that the theano functions will play nicely with pandas groupby/agg; but I could be wrong.

Hi Chartl,

Thanks man. But for grouping it is not that easy to do without pandas

Hi Chris,

I have revised my code based on your suggestion, here is my main code now:

```
with pm.Model() as model:
### prior for unknown parameters
# prior for lambda
BoundedNormal_lam = pm.Bound(pm.Normal, lower=lam_lbd, upper=lam_ubd)
lam = BoundedNormal_lam('lam', mu=lam_mean, sd=lam_sd, shape = 8)
# prior for gamma
BoundedNormal_gamma = pm.Bound(pm.Normal, lower=0.2, upper=0.8)
gamma = BoundedNormal_gamma('gamma', mu=0.5, sd=0.1)
# prior for beta
BoundedNormal_beta = pm.Bound(pm.Normal, lower=b_lbd, upper=b_ubd)
beta = BoundedNormal_beta('beta', mu=b_mean, sd=b_sd, shape = 8)
# prior for v
v = pm.Uniform('v', lower = 1e-2, upper =1e2)
# decayed promotion: pde * lam_j^{t-l}
xxdat['ad'] = (xxdat['pde_norm'].values) * (lam[xxdat['chnl_cd'].values-1]**(xxdat['gap_nrx_promo'].values))
# Sum up of decayed promotion for each NRx + each channel: \sum_t pde * lam_j^{t-l}
yydat = xxdat.groupby(['spcl_st_cd','nrx_wk', 'chnl_cd', 'nrx_norm'], as_index= False).agg({'ad':test_sum})
yydat.rename(columns = {'ad': 'ad_sum'}, inplace = True)
yydat['ad_sum_r'] = (yydat['ad_sum'] ** gamma) # Add adstock diminishing factor (\sum_t pde * lam_j^{t-l})^r
# normalize each channel's adstock
yydat['ad_norm'] = (yydat['ad_sum_r'].values)/(yydat.groupby('chnl_cd', as_index=False).agg({'ad_sum_r':test_mean})['ad_sum_r'][yydat['chnl_cd'].values-1]).values
# sum up adstock multiplying impact factor over channels
yydat['bx'] = (beta[yydat['chnl_cd'].values -1]*(yydat['ad_norm']))
zzdat = yydat.groupby(['spcl_st_cd', 'nrx_wk', 'nrx_norm'], as_index = False).agg({'bx':test_sum})
# Get the final mu
mu = test_mean(zzdat['bx'])
### likelihood (sampling distribution) of observations
Y_obs = pm.Normal('Y_obs', mu=mu, sd = v, observed = zzdat['nrx_norm'])
```

And here are these two test functions:

```
def test_sum(x):
return tt.sum(x.to_list())
def test_mean(x):
return tt.mean(x.to_list())
```

It can run successfully with input data having row number < 256. But once it is larger than it, i got this error:

```
fatal error: bracket nesting level exceeded maximum of 256.
```

I have tried changed the maximum number to some larger number but it give some other errors. So i think still i need to revise my code. Do you have any more idea about my case? Thank you!

The type conversion is likely quite costly even if the pandas aggregation function works, I think you should considered doing the conversion and data manipulation outside of `pm.Model()`

and inside the model only operates on numpy arrays

Hi Junpeng,

Thanks for your suggestion. But the thing here is that all the aggregations are related to the prior parameters, so it cannot be moved out of the context. for example the first row:

```
xxdat['ad'] = (xxdat['pde_norm'].values) * (lam[xxdat['chnl_cd'].values-1]**(xxdat['gap_nrx_promo'].values))
```

it generate the new column `ad`

by using parameter `lam`

, and then this column will be used into the `groupby`

function in the next row.

I am also feel like too many aggregations within the pm.model context can definitely lower the speed but could not find a better way to solve this

My suggestion would be not to group using pandas; but instead do it yourself. My guess is that the aggregation is implemented like a reduce, so theano is getting the equivalent of

```
tt.sum(a[0], tt.sum(a[1], tt.sum(a[2], ...
```

My suggestion is to convert the initial aggregated data frame (yydat – which initially contains only data so can use `np.sum`

) into a dictionary keyed by your grouping variables, and to operate on the dictionary instead of using pandas functions.

Hi Chris,

Thanks for your suggestion! I have decided to change all pandas operations and calculation to numpy but still i faced some problems. Here are the old code:

```
xxdat['ad'] = (xxdat['pde_norm'].values) * (lam[xxdat['chnl_cd'].values-1]**(xxdat['gap_nrx_promo'].values))
yydat = xxdat.groupby(['spcl_st_cd','nrx_wk', 'chnl_cd', 'nrx_norm'], as_index= False).agg({'ad':test_sum})
```

and for the first row, the problem is that `lam`

is a symbolic tensor, so even though we have for example 100 rows, after the multiplication it will still result in only one value: `Elemwise{mul,no_inplace}.0`

. But behind the scene it has the shape `(100,1)`

. So I decided to do this calculation row by row with below code and also create groupby function using `unique`

:

```
basic_model = pm.Model()
with basic_model:
### prior for unknown parameters
# prior for lambda
BoundedNormal_lam = pm.Bound(pm.Normal, lower=lam_lbd, upper=lam_ubd)
lam = BoundedNormal_lam('lam', mu=lam_mean, sd=lam_sd, shape = 8)
# prior for gamma
BoundedNormal_gamma = pm.Bound(pm.Normal, lower=0.2, upper=0.8)
gamma = BoundedNormal_gamma('gamma', mu=0.5, sd=0.1)
# prior for beta
BoundedNormal_beta = pm.Bound(pm.Normal, lower=b_lbd, upper=b_ubd)
beta = BoundedNormal_beta('beta', mu=b_mean, sd=b_sd, shape = 8)
# prior for v
v = pm.Uniform('v', lower = 1e-2, upper =1e2)
# decayed promotion: pde * lam_j^{t-l}
ad = np.apply_along_axis(caclculate_ad, 1, xxdat_matrix)
xxdat_matrix = np.append(xxdat_matrix,ad.reshape(-1,1),axis = 1)
yydat_matrix = np.empty((len(GroupId1), 5), dtype = 'object')
for index, group in enumerate(GroupId1):
sub_group = xxdat_matrix[GroupId1 == group]
sum_ad = tt.sum(sub_group[:,-1].tolist())
sum_ad_gammma = sum_ad**gamma
yydat_matrix[index,:] = np.append(sub_group[:,[0,1,2,3]][0].reshape(1,-1), np.array([sum_ad_gammma]).reshape(1,-1), axis = 1)
bx = np.apply_along_axis(calculate_bx, 1, yydat_matrix)
mu = tt.mean(bx.tolist())
### likelihood (sampling distribution) of observations
Y_obs = pm.Normal('Y_obs', mu=mu, sd = v, observed = yydat_matrix[:,2])
```

where some pre-calculated staff and functions are as below:

```
### Transfer pandas data frame into numpy matrix
xxdat_matrix = xxdat.values
### Get the group ID
GroupId1 = np.unique(xxdat_matrix[:,[0,1,2,3]], axis = 0, return_inverse=True)[1]
def caclculate_ad(x):
'''x is a numpy array contains all the basic information'''
pde_norm = x[5]
channel = int(x[3])-1
return pde_norm*((lam[channel])**(x[6]))
def calculate_bx(x):
'''x is a numpy array contains ad_sum_r information'''
channel = int(x[3])-1
ad_sum_r = x[4]
return beta[channel]*ad_sum_r
```

Now within `pymc3.basic_model`

context it is working fine. But the issues is, when i try to sample from it, lets say using `find_MAP`

, i got this error:

```
Exception: ('The following error happened while compiling the node', Elemwise{Composite{(Switch(Cast{int8}((GE(i0, i1) * LE(i0, i2))), (i3 * i4 * (i5 - i4) * (i6 + i7)), i8) + (i9 * (Switch(EQ(i10, i8), i8, (i11 * log(i10) * i12)) + Switch(EQ(i13, i8), i8, (i14 * log(i13) * i15))
```

It is a very long error and not intuitive at all to me… any idea on this?

Thanks so much for your time!

Best,

Fan

I’m actually quite surprised this is working at all because of mixing numpy and theano. Try using

`yydat_matrix = tt.zeros((len(GroupId1), 5), dtype=theano.config.floatX)`

and `tt.set_subtensor(yydat_matrix, ...)`

in place of `yydat_matrix[index,:] =`

And you’ll probably need to hand-implement `apply_along_axis`

(see https://stackoverflow.com/questions/49581702/what-is-the-apply-along-axis-tensorflow-equivalent)

Hi Chris,

Thank you so much! I finally got some results based on your suggestions. Really learned a lot from you these days!

If there are some further issues i may bother you again

Best,

Fan

Hi Chris,

I just faced another issue that maybe only related to theano: One of my groupby task was implemented by the following code:

```
zzdat_matrix = tt.zeros((len(GroupId3), 4), dtype=T.config.floatX)
for index,group in enumerate(GroupId3):
sub_group = yydat_matrix[tt.all(tt.eq(yydat_matrix[:,[0,1,2]].astype('float32'), group), axis = 1)]
bx = tt.sum(beta[(sub_group[:,3].astype('int32'))-1]*sub_group[:,4])
tt.set_subtensor(zzdat_matrix[index,[0,1,2]], group)
tt.set_subtensor(zzdat_matrix[index,3], bx)
```

But it will run forever without any error. Then I found out that maybe there is something wrong with the `sub_group`

because if i tried `sub_group[0]`

, it gave me the error that index is out of bound. But i dont see anything wrong with my boolean indexing here for `yydat_matrix.`

To have a quick recap, `yydat_matrix`

is a symbolic matrix with 5 columns, `GroupId3`

is a numpy array with shape 3xn, since `tt.eq`

works element-wise, i need to add a `tt.all`

after that.

Do you have any idea what’s going wrong here? Thanks!

Best,

Fan

Take a look at the documentation for `set_subtensor`

. It does not alter the input variable, but rather returns a copy with the changes made

I have changed it like this:

```
for index, group in enumerate(GroupId1):
sub_group = xxdat_matrix[GroupId1 == group]
sum_ad = tt.sum(sub_group[:,-1].tolist())
sum_ad_gammma = sum_ad**gamma
yydat_matrix = tt.set_subtensor(yydat_matrix[index,[0,1,2,3]], group)
yydat_matrix = tt.set_subtensor(yydat_matrix[index,4], sum_ad_gammma)
```

But now it gives me this error:

```
RecursionError: maximum recursion depth exceeded
```

I increased the recursion limit number but still cannot get the correct results.

Almost there. How many groups do you have? If it’s quite a few then the recursion limit probably comes from these self-referential `set_subtensor`

calls. You can do some thing like

```
g_idx, g_vals, gam_vals = list(), list(), list()
for index, group in enumerate(GroupId1):
sub_group = xxdat_matrix[GroupId1 == group]
sum_ad = tt.sum(sub_group[:,-1].tolist())
sum_ad_gammma = sum_ad**gamma
g_idx.extend(index)
g_vals.extend([group]*len(index)]
gam_vals.extend(sum_ad_gamma)
yydat_matrix = tt.set_subtensor(yydat_matrix[g_idx,[0,1,2,3]], g_vals)
yydat_matrix = tt.set_subtensor(yydat_matrix[g_idx,4], gam-vals)
```

Now `list()`

likely won’t work. My guess is you’ll need a correctly-shaped matrix or array; but you get the idea. This way you’re only adding two `set`

nodes to the computation graph.

Hi Chris,

Thanks! I got your idea and tried two ways:

- First used the method you suggested to create a array:

```
ls_ad_gamma = np.zeros((len(GroupId1), 1), dtype=object)
for index, group in enumerate(GroupId1):
sub_group = xxdat_matrix[np.all(xxdat_matrix[:,[0,1,2,3]] == group, axis = 1)]
sum_ad = tt.sum(sub_group[:,-1].tolist())
ls_ad_gamma[index] = sum_ad**gamma
yydat_matrix = tt.set_subtensor(yydat_matrix[:,4], ls_ad_gamma)
yydat_matrix = tt.set_subtensor(yydat_matrix[:,[0,1,2,3]], GroupId1)
```

Problem of this is that, the numpy array ls_ad_gamma cannot be added to the `yydat_matrix`

tensor. It looks like this:

```
array([[Elemwise{pow,no_inplace}.0],
[Elemwise{pow,no_inplace}.0],
[Elemwise{pow,no_inplace}.0],
...,
[Elemwise{pow,no_inplace}.0]], dtype=object)
```

and got this error:

```
('Cannot convert [[Elemwise{pow,no_inplace}.0]\n [Elemwise{pow,no_inplace}.0]\n [Elemwise{pow,no_inplace}.0]\n ...\n [Elemwise{pow,no_inplace}.0]\n [Elemwise{pow,no_inplace}.0]\n [Elemwise{pow,no_inplace}.0]] to TensorType', )
```

change `ls_ad_gamma`

to `ls_ad_gamma.tolist()`

still cannot resolve it. And also if we create a tensor type `ls_ad_gamma`

, then we will do this recursion item assignment again.

- Then I tried to simplify my looping:

```
yydat_matrix = tt.zeros((len(GroupId1), 5), dtype=T.config.floatX)
for index, group in enumerate(GroupId1):
sub_group = xxdat_matrix[np.all(xxdat_matrix[:,[0,1,2,3]] == group, axis = 1)]
sum_ad = tt.sum(sub_group[:,-1].tolist())
sum_ad_gamma = sum_ad**gamma
yydat_matrix = tt.set_subtensor(yydat_matrix[index,4], sum_ad_gamma)
yydat_matrix = tt.set_subtensor(yydat_matrix[:,[0,1,2,3]], GroupId1)
```

but it still got the recursion limit error. Any idea about this?

Thanks!

Fan