Hello, total beginner here. I’ve been interesting in, and reading about Bayesian analysis for several months now but never had a simple problem in mind until now. I’d be most grateful if someone might give me some guidance on how to proceed with it?
The data I’d like to explore is the US airports waiting time, as published at awt.cbp.gov. I’ve downloaded the entire dataset and have an app that can filter it by airport terminal and dates, in order to give some idea of the answer to questions like “If I fly into JFK at 1100 on a Wednesday in April on AA, how long should I expect my maximum wait time to be?”, which my app filters and shows as:
Now, I could perform a frequentist analysis on this and come up with a distribution of times, but can I do better with a Bayesian analysis? Could I have a distribution that is taken principally from this series, but is also informed by neighboring weekdays and hours, as well as, maybe, the number of flights arriving in that time window?
In order to do that, do I have to do something like take a flatish prior to build a posterior over all my data, then use that as a prior to the terminal/day/time in question?
Are there any examples I could use to get me going?
Another option is that you use a time-series model. That’s more complex, but probably more appropriate for the type of data you’re working with. There are plenty of resources on time-series in PyMC that you can find on google or youtube. You can find a very good intro here: 6. Time Series — Bayesian Modeling and Computation in Python .
Maybe this is too much to take in at once. But it may be useful to have access to several resources in one place, and maybe you can check them up incrementally. Many (as I did) starter with the hierarchical model example and very slowly explored more complex/specific models. May be worth considering starting with a simpler example as well (you’ll probably need different parametrisations for waiting time data, e.g. a Gamma likelihood, etc.). I hope this helps.
Hi @IanW. If this is one of your first models, then I’d recommend starting off with the “time series as regression” type approach. The chapter in Martin, Kumar & Lao is great on this: 6. Time Series — Bayesian Modeling and Computation in Python. But essentially you ignore the temporal ordering of observations and just model your outcomes as some linear combination of predictors. You could have categorical predictors, such as airlines or airport or whatever. You could also have dummy variables that indicate holiday periods. There are lots of ways to then embellish this model so the approach can become half decent. It could be worth thinking about if you didn’t want to dive into actual time series modeling at this point.