What is the best/cleanest way to model the interactions between categorical variables while using PyMC’s shape functionalities (e.g. coords, dims, data containers)?
The example I’m working on is an unpooled negative binomial model where type and region are categorical variables: nv ~ type + region + region:type + enrollment_offset
I currently achieve this using the following code where in I create a new type_region field, but I feel as though there must be a cleaner way of doing this by referencing the type & region dims without creating additional columns.
df_crime = pd.read_csv('https://raw.githubusercontent.com/proback/BeyondMLR/master/data/c_data.csv',
usecols=['nv', 'type', 'region', 'enroll1000'],
dtype={'type':'category', 'region':'category'}
)
# I want a better way than this following line of code!
sr_type_region = pd.Categorical(df_crime.type.astype(str) +'_'+ df_crime.region.astype(str))
coords={
'type': df_crime.type.cat.categories,
'region': df_crime.region.cat.categories,
'type_region':sr_type_region.categories, # How do I replace this?
}
with pm.Model(coords=coords) as model_05: #negative binomial model
# Data containers
d_type = pm.Data('type', df_crime.type.cat.codes)
d_region = pm.Data('region', df_crime.region.cat.codes)
d_enrollment_offset = pm.Data('enrollment_offset', df_crime.enroll1000.values)
d_crime = pm.Data('crime', df_crime.nv.values)
# Coefs Priors
b0 = pm.Normal('b0', mu=0, sigma=10)
b_type = pm.Normal('b_type', mu=0, sigma=10, dims='type')
b_region = pm.Normal('b_region', mu=0, sigma=10, dims='region')
b_type_region = pm.Normal('b_type_region', mu=0, sigma=10,
dims='type_region') # would like this to look more like dims=('type', 'region')
# Systematic component
y_sys = (
b0
+ b_type[df_crime.type.cat.codes] * d_type
+ b_region[df_crime.region.cat.codes] * d_region
+ b_type_region[sr_type_region.codes] * sr_type_region.codes # Is there a better way?
+ pm.math.log(d_enrollment_offset) # offset to weight crime counts by enrolments
)
# Inverse Link Function
y_mu = pm.math.exp(y_sys)
# Prior
y_alpha = pm.Exponential('y_alpha', 0.5)
# Random component
y = pm.NegativeBinomial('y', mu=y_mu, alpha=y_alpha, observed=d_crime)
Is there a better way to index b_type_region
? What is the best practice for setting the shape/dimensionality of interactions between categoricals in PyMC?
I’m fairly new to PyMC; Thanks for the help.