How to name categorical variables?

Thanks @junpenglao
patsy library seems very flexible, should be able to do a lot of things. I will explore manually how to use it to improve my code.

In the meantime, this is how I solve the categorical variables problem:

‘dow’ is a categorical variable “Day of Week”, naturally has 7 values.

Step 1: dummy encode
data = pd.get_dummies(data,prefix='dow',columns=['dow'],drop_first=True)
Make sure drop_first = True to remove redundant column

Step 2: set up data
dows = [col for col in data if str(col).startswith(‘dow_’)]
dow_cols = data[dows]
no_dow = len(dows)

Step 3: set up model
b = pm.Normal(‘b’, mu=0., sd=0.5, shape=no_dow) # Note no_dow is 1 fewer than the actual number of categories

k = pm.math.matrix_dot(dow_cols,b) 

Use theano dot matrix operation to perform the sum multiplication, which can handle matrix of any shape without manually typing out each category.

I think Patsy library can be employed to automate some of these steps, but this will work for now.

1 Like