Thanks @junpenglao
patsy library seems very flexible, should be able to do a lot of things. I will explore manually how to use it to improve my code.
In the meantime, this is how I solve the categorical variables problem:
‘dow’ is a categorical variable “Day of Week”, naturally has 7 values.
Step 1: dummy encode
data = pd.get_dummies(data,prefix='dow',columns=['dow'],drop_first=True)
Make sure drop_first = True to remove redundant column
Step 2: set up data
dows = [col for col in data if str(col).startswith(‘dow_’)]
dow_cols = data[dows]
no_dow = len(dows)
Step 3: set up model
b = pm.Normal(‘b’, mu=0., sd=0.5, shape=no_dow) # Note no_dow is 1 fewer than the actual number of categories
k = pm.math.matrix_dot(dow_cols,b)
Use theano dot matrix operation to perform the sum multiplication, which can handle matrix of any shape without manually typing out each category.
I think Patsy library can be employed to automate some of these steps, but this will work for now.