Getting started with Bayesian: expected values

IanW · September 19, 2023, 12:43pm

Hello, total beginner here. I’ve been interesting in, and reading about Bayesian analysis for several months now but never had a simple problem in mind until now. I’d be most grateful if someone might give me some guidance on how to proceed with it?

The data I’d like to explore is the US airports waiting time, as published at awt.cbp.gov. I’ve downloaded the entire dataset and have an app that can filter it by airport terminal and dates, in order to give some idea of the answer to questions like “If I fly into JFK at 1100 on a Wednesday in April on AA, how long should I expect my maximum wait time to be?”, which my app filters and shows as:

Now, I could perform a frequentist analysis on this and come up with a distribution of times, but can I do better with a Bayesian analysis? Could I have a distribution that is taken principally from this series, but is also informed by neighboring weekdays and hours, as well as, maybe, the number of flights arriving in that time window?

In order to do that, do I have to do something like take a flatish prior to build a posterior over all my data, then use that as a prior to the terminal/day/time in question?

Are there any examples I could use to get me going?

Simon · September 19, 2023, 2:39pm

One option is that you build up a hierarchical model, with indexed airport, day and time. You can find the classic PyMC example for hierarchical models here: A Primer on Bayesian Methods for Multilevel Modeling — PyMC example gallery .

Another option is that you use a time-series model. That’s more complex, but probably more appropriate for the type of data you’re working with. There are plenty of resources on time-series in PyMC that you can find on google or youtube. You can find a very good intro here: 6. Time Series — Bayesian Modeling and Computation in Python .

Gaussian random walks and Gaussian processes (GP) are possible alternatives for time-series. Here’s an example for a GP regression: Gaussian Process Regression — PyMC3 3.1rc3 documentation .

If interested in going deeper into these problems, pymc-experimental recently added state-space models: `pymc-experimental` now includes state spaces models! .

Maybe this is too much to take in at once. But it may be useful to have access to several resources in one place, and maybe you can check them up incrementally. Many (as I did) starter with the hierarchical model example and very slowly explored more complex/specific models. May be worth considering starting with a simpler example as well (you’ll probably need different parametrisations for waiting time data, e.g. a Gamma likelihood, etc.). I hope this helps.

drbenvincent · September 23, 2023, 11:31am

Hi @IanW. If this is one of your first models, then I’d recommend starting off with the “time series as regression” type approach. The chapter in Martin, Kumar & Lao is great on this: 6. Time Series — Bayesian Modeling and Computation in Python. But essentially you ignore the temporal ordering of observations and just model your outcomes as some linear combination of predictors. You could have categorical predictors, such as airlines or airport or whatever. You could also have dummy variables that indicate holiday periods. There are lots of ways to then embellish this model so the approach can become half decent. It could be worth thinking about if you didn’t want to dive into actual time series modeling at this point.

Topic		Replies	Views
Passenger arrival rate partial pooling model Questions	2	706	September 15, 2017
Time series analysis tutorials? Questions	3	3789	January 3, 2018
Bayesian Hierarchical Modelling with multiple timepoints for patients as an outcome Questions	2	403	February 11, 2021
How can I apply bayesian statistics and pymc3 on the following problem Questions	2	584	August 15, 2018
Bayesian Inference with different types of observations v5 modeling	0	308	April 21, 2023

Getting started with Bayesian: expected values

Related topics