I’m trying to set up a Poisson GLM model which is somewhat hierarchical. I read through quite a few blog posts but couldn’t really manage to understand how to apply that to my case. Any help would be greatly appreciated!
A simplified version of my problem is the following:
I’m trying to fit an auto insurance portfolio with claim counts and exposure. I have several vehicle characteristics, for this limited example lets focus on car model (Model
), Vehicle Class and Vehicle Size and an exposure column called simply exposure
. Each car Model is associated with a vehicle class and vehicle size.
I would like to fit the following model, which in essence says that the car model frequency mean is a function of both vehicle class and vehicle size (and possibly an interaction between them).
α_class ~ Normal(0, 0.3)
α_size ~ Normal(0, 0.3)
α_model ~ Normal(α_class + α_size, σ_model)
ln(μ) = α_model + offset(ln(exposure))
claims ~ Poisson(μ)
My problem starts with trying to correctly juggling all of the dimensions of the problem, especially defining the α_model
prior.
One of the trials I made looks like this:
coords = {
'size': df.VehicleClass.cat.categories,
'class': df.VehicleSize.cat.categories,
'models': df.Model.cat.categories,
}
size_codes = df.VehicleClass.cat.codes
class_codes = df.VehicleSize.cat.codes
with pm.Model(coords=coords) as m:
α_class = pm.Normal("α_class", 0, 0.3, dims="class")
α_size = pm.Normal("α_class", 0, 0.3, dims="size")
# What I want to do that wouldn't work
# because α_model should have dims='models'
# but α_class[class_codes] and α_size[size_codes
# have the number of observations as shape.
α_model = pm.Normal("α_model", α_class[class_codes] + α_size[size_codes], 0.3)
# I also tried to come up with a mapping class -> model & size -> model
class_to_model: dict = ...
α_model = pm.Normal("α_model", α_class[class_to_model[class_codes]] + ...)
# but again the dictionary translation will have the size of the number of observations.
# The rest of the code would be something like:
claims = pm.Poisson("claims", at.exp(α_model + at.log(df.exposure)), observed=df.claims)
I have made other attempts as well, but for some reason I can’t wrap my head around how to implement this properly.
Any help would be greatly appreciated!