I’m trying to set up a Poisson GLM model which is somewhat hierarchical. I read through quite a few blog posts but couldn’t really manage to understand how to apply that to my case. Any help would be greatly appreciated!
A simplified version of my problem is the following:
I’m trying to fit an auto insurance portfolio with claim counts and exposure. I have several vehicle characteristics, for this limited example lets focus on car model (Model), Vehicle Class and Vehicle Size and an exposure column called simply exposure. Each car Model is associated with a vehicle class and vehicle size.
I would like to fit the following model, which in essence says that the car model frequency mean is a function of both vehicle class and vehicle size (and possibly an interaction between them).
α_class ~ Normal(0, 0.3)
α_size ~ Normal(0, 0.3)
α_model ~ Normal(α_class + α_size, σ_model)
ln(μ) = α_model + offset(ln(exposure))
claims ~ Poisson(μ)
My problem starts with trying to correctly juggling all of the dimensions of the problem, especially defining the α_model prior.
One of the trials I made looks like this:
coords = {
'size': df.VehicleClass.cat.categories,
'class': df.VehicleSize.cat.categories,
'models': df.Model.cat.categories,
}
size_codes = df.VehicleClass.cat.codes
class_codes = df.VehicleSize.cat.codes
with pm.Model(coords=coords) as m:
α_class = pm.Normal("α_class", 0, 0.3, dims="class")
α_size = pm.Normal("α_class", 0, 0.3, dims="size")
# What I want to do that wouldn't work
# because α_model should have dims='models'
# but α_class[class_codes] and α_size[size_codes
# have the number of observations as shape.
α_model = pm.Normal("α_model", α_class[class_codes] + α_size[size_codes], 0.3)
# I also tried to come up with a mapping class -> model & size -> model
class_to_model: dict = ...
α_model = pm.Normal("α_model", α_class[class_to_model[class_codes]] + ...)
# but again the dictionary translation will have the size of the number of observations.
# The rest of the code would be something like:
claims = pm.Poisson("claims", at.exp(α_model + at.log(df.exposure)), observed=df.claims)
I have made other attempts as well, but for some reason I can’t wrap my head around how to implement this properly.
Any help would be greatly appreciated!