How to set up Hierarchical Poisson model where the *sum* of the rates at one level is controlled by a higher level

Hey guys,

I am at the pre-modelling stage trying to wrap my head around how the structure of the model should look like …

Abstract description of the data is like this. Suppose there is some rate of something that occurs (say per day) lam = rate … but suppose this something that occurs can be broken into sub-categories 1 through n each with a different rate lam_1 … lam_n … and the total rate is the sum of the rates of the sub-processes lam = sum_i lam_i … and then suppose that each category i, can be broken further into sub categories so that each lam_j is the sum of smaller rates lam_j = sum_k lam_jk …

concrete example: In hockey the number of shots on net per game might be lam ~ N(32, 7)
and let’s say there are 3 types of shots: high, med, low danger (danger = probability of goal)
so lam = lam_1 + lam_2 + lam_3 … and now suppose that high danger breaks into 3 sub-vategories lam_1 = lam_11+ lam12+lam_13 … and medium, low also break into some sub-categories

my question is: In this sort of set up, it possible to make a hierarchical model where lam is estimated … and then lam_1, lam_2, lam_3 are estimated in such a way that lam_1+lam_2+lam_3 = lam and then each lam_1j are estimated in such a way that sum_j lam_1j = lam_1 … and so on … ?

So basically in a typical hierarchical model, you would have a global mean mu, and then a mean mu_j for several sub-categories but the mu_j would be some deviation from the global mean …
In this set up we have several sub-rates lam_j, but they are related to the global rate
via lam_global = Sum_j lam_j …


[side note: Ah, I see this is just one constraint for all the lam_j, so this is not giving enough constraints… So, I suppose we want to know the average lam_j^global and use that to constrain each lam_j … or alternatively the global average percentages p1 , pk such that lam_j = pj lam_global … so this suggests a sequence of dirichelt distributions Diric(p1 … pk) for each sub-category … ]


The reason I want to do this is that if you ignored these nested categories and just went to finest level of categories (say 40 types of shots) then you get a bunch of very small rates with high variance that are highly correlated … (for example, if you a lot of high danger, sub category 1, then you should also have a lot of high danger, sub category 2) … so estimating the rates in bigger categories and then using that as a basis for the smaller sub-categories at each step is meant to try to tie the levels together and help “regularize” these highly correlated rates …

Any thoughts on how one might go about this would be appreciated … (forgive me if this explanation is not concrete enough to follow; if there is some interest I will look into providing more explicit data/example)

Does a regular hierarchical model, where each group value is centered around the mean of the parent with a learn std not work? You still only use the lower levels for prediction, but the information from other groups flows through the hierarchical structure?

Hi! Thanks for the reply …

Well, to my mind the mean of the lower level category rates lam_j would not be on the same order of magnitdude as the parent rate, lam and this is really the crux of the issue to my mind … Let’s say lam was around 3, and there were 6 sub-categories lam_1 … lam_6 then each lam_j would each be in the ball park of 0.5, as a naive estimate, since we require lam_1 + … + lam_6 = lam … But this does give me the simple idea of centering each lam_j around lam/n where n is the number of categories …

—> Would estimating each lam_j ~ N(lam/n, sigma) make sense? where n = number of sub-categories … [or replace N(lam/n, sigma) by some more appropriate distribution]

… or to put it in terms of Dirichlet distribution, this would be (p1, … pn) ~ Dirichlet(alpha_1, … alpha_n) where all alpha_j are approximately equal and therefore pj is approx 1/n
and lam_j = pj lam

One way to do this would be to break this up into multiple models instead.

At the highest level, you predict the total number of shots daily. Then you use a binomial distribution to predict the probability (i.e., rate) that each of those shots is low, medium, high danger. When you go to make predictions for the rate of each shot type just normalize the rates so that they sum to 1.

If your data is shot by shot, your binomial distribution could be a bernoulli instead.

Yeah … maybe something like the following would work:
(1) estimate the total shots N, via Poiss(lam_tot)
(2) estimate the the probabilities (p1, p2, p3) of high, med, low via a Dirichlet distribution.*

*note: Here (p1, p2, p3) would be close to the league average, but there would be a fixed amount of variation, say sigma, which represents how much each team can vary around the mean … (that is, this would be a hieracrghical model if, say. sigma in HN(0.2) is also estimated as a parameter)

(3) Plug N and (p1, p2, p3) into a multinomial distribution (rather than binomial, because more than 2 probabilities)
(4) sampling k1, k2, k3 ~ Multinomial(p1,p2,p3, N) we get counts k1, k2, k3 such that Sum kj = N which are estimates on the number of high, med, low danger shots …

(5) Then, at each level, one could repeat this process … for example, if you wanted to split the number of high danger shots into, 5 new sub-categories k1 —> k11, k12, k13, k14, k15 … one could estimate probabilities (p11, … p15) and so on …

side note: Now, I believe if you let N vary over Poiss(lam_total), then sampling over bith distributions
k1, k2, k3 ~ Multinomial(p1,p2,p3, N), N ~ Poiss(lam_tot)
is equivalent to selecting from several Poisson distributions kj ~ Poiss(lam_j = pj lam_tot) … in each case Sum kj = N, but the N varies from sample to sample …

Yeah that sounds like the basic idea I had in my head. I usually skip the dirichlet step and just plug each rate into separate binomials. You aren’t guaranteed that the rates sum to 1 in that case but it’s usually pretty close, much simpler, and samples faster. Then I just renormalize rates to 1 when making predictions. But adding in the dirichlet distribution would be more technically sound.

Gotcha, thanks for the tip. Simpler and faster is always appreciated … can make it more sound in a later iteration if needed.