Create a hiearchical linear regression with 5 levels

Dear all

I’m working in a forecast project for a fashion company. In this forecast, we have the following hierarchical structure:

1 - Level 1: Shoes
2 - Level 2: Sport_Category_1, Sport_Category_2, Sport_Category_3 …
3 - Level 3: First Color: Black, Blue, Write and Green
4 - Level 4: Second Color: Black, Blue, Write and Green

I started creating the hierarchical model by “merging” the sport + first color + second color category into a group (see image on the left). One of the challenges is that I need to “forecast” new products without historical data (product 4 in the photo). Therefore, I need to use the historical information from the group to guess the demand for those without data. By “merging” all the groups into a group, I can lose some information that might be useful in estimating demand for the new product.

I would like to create a hiearchical (linear regression) model with 5 levels (see image to the right). The slope and intercept will be varied for each of the levels. Therefore, I will have a prior on the slope and the intercept for each level (sport category, first color, second color and the product). The goal will be to improve the stimation of the slope and the intercept for the new products using this hierarchical relationship.

Important point: each product will be in only one category (sport category, first and second color).

Another point is that if I create the variables using the shape=(n_sport_category , n_first_color), I may not have all combinations. I also want to avoid create variable for those that I dont have any combinatons.

My question is: how can I create the hierarchical model (see the picture on the right)? how can I index the problem for each level?

Follow attached the database. I already created the code for each level and the demand is between 0 and 1

thank you

This should not be a problem. You can check out this notebook to see how to handle nested data and bake it into hierarchical models. You might need to recode some of the values in your data to make things a bit easier for yourself. For examples, if you category 1 has 2 “first colors” and category 2 has 3, then you might want to code your “first colors” as:

cat1_color1
cat1_color2
cat2_color1
cat2_color2
cat2_color3

Then the number of “first color”-related parameters would be equal to the number of unique “first color” values in your data. Similar things could be done for the second color.