Dear Bayesians,
I would like to learn how you would approach the following situation:
I have data about a special promotion type. Customer have been addressed for certain products. I have a lot of categories that classify the customers, and for the sake of simplicity, I will pick three here to make my question clear:
- age groups of customers
- 2-digit zip codes
- recency of purchase (last 0-2 years, 2-5 years, 5+ years)
For each customer, I have information if she converted or not.
Now, I would like to fit a model in which the influence of these three categories is estimated. The aim of the model is to learn how to select customers for future promotions. For example, you can assume that 1 million customers could be available, but we only want to select the best 100k customers based on these categories.
I could now create a very simple model like this:
"conversion ~ (1|age_group) + (1|zip_code) + (1|recency) ", family="bernoulli", ...
However, some of these “groups” would have really small sample sizes, especially some zip codes. Also, age_group and recency seem to be much more important features.
I would therefore tend to create a hierarchy in which the zip code influence is only estimated in relation to age_group and recency. This could look like this:
"conversion ~ (1|age_group) + (1|recency) + (1|age_group:zip_code) + (1|recency:zip_code)", family="bernoulli", ...
However, this leaves open the relation of age group and recency. These could also be in a hierarchy, and the question is if I should choose (1|age_group:recency) or (1|recency:age_group) here.
Now, this is only an example of 3 categories. You can imagine that this problem will explode when I have 10 categories. Which is dependent on which other?
The question now is how you would approach such a modeling situation. Would you start with a simple model without any conditional variables and then just try out and introduce some hierarchical relations? Given that I have lot of data (millions of rows), the fitting of each model needs quite some time, so I am searching to optimize this “try and error” approach.
I would be interested in how you would go through the modeling process in such a situation.
Best regards
Matthias