Hello, I am working on a portfolio project to show that I working to learn about MMM and rather than create a simulated dataset, I choose to use the date provided at the following Kaggel page:
https://www.kaggle.com/datasets/mediaearth/traditional-and-digital-media-impact-on-sales/
The data is monthly and doesn’t list any demographic information on the customers or the country where the advertising is being done. Nor what the company sells. Based on the profile of the dataset poster, I am working on the assumption that the country in question is Singapore and so am attempting to determine some appropriate external variables to bring in. I am looking at cpi with period over period change on a monthly basis as one external variable, have considered adding a variable based on if the National Criket team won that month as Criket sponsorship is an ad channel, and am trying to decide on an appropriate way to capture national holidays in these data. Would a variable with a count of non-working days per month be appropriate, or should I simply have a binary variable reflecting that a month contains at least one holiday? I worry the preponderance of zeroes would make the variable less informative in that context.
If you are interested in seeing the work in progress, my GitHub is linked below: