The Bayesian Workflow: Building a COVID-19 Model by Thomas Wiecki

Talk Abstract

In this tutorial we will build a COVID-19 model from scratch.


Part 1

Part 2

Thomas Wiecki

Thomas is the founder of PyMC Labs, a Bayesian consulting firm.

This is a PyMCon 2020 talk

Learn more about PyMCon!

PyMCon is an asynchronous-first virtual conference for the Bayesian community.

We have posted all the talks here in Discourse on October 24th, one week before the live PyMCon session for everyone to see and discuss at their own pace.

If you are available on October 31st you can register for the live session here!, but if you are not don’t worry, all the talks are already available here on Discourse (keynotes will be posted after the conference) and you can network here on Discourse and on our Zulip.

We value the participation of each member of the PyMC community and want all attendees to have an enjoyable and fulfilling experience. Accordingly, all attendees are expected to show respect and courtesy to other attendees throughout the conference and at all conference events. Everyone taking part in PyMCon activities must abide by the PyMCon Code of Conduct. You can report any incident through this from.

If you want to support PyMCon and the PyMC community but you can’t attend the live session, consider donating to PyMC

Do you have suggestions to improve PyMCon? We have an anonymous suggestion box waiting for you

Have you enjoyed PyMCon? Please fill our PyMCon attendee survey. It is open to both async PyMCon attendees and people taking part in the live session.


Hey Thomas, thanks for this explanation of this model. Certainly makes something accessible that wouldn’t otherwise be to me! so feel free to ignore these questions if they are too onerous: So the Gaussian random walk is flexibly capturing all the structure related to policy / individual interventions that decrease the virus’s reproduction rate. Can you explain any more intuition behind ‘the sampler will try and fit the rope to best explain the data’? In a longer term predictive setting would it just go back to random or would it actually learn some structure? Could you somehow include an observed variable for the social distancing policies that were implemented and then pass into the prediction a range of forward policy options and compare them?

1 Like

Really nice talk, Thomas. Thanks very much! One thing we see in the data is a clear weekly cycle, which I assume is due to some combination of the workweek (fewer people tested on weekends?) and the data reporting/collection process. Knowing that these effects are artifacts, I wonder if it would improve the results if you smooth the data before running the model.


Or add a weekly component to the Gaussian Random Walk, maybe something from Hierarchical Time Series With Prophet and PyMC3 by Matthijs Brouns :wink:


Thanks, Allan.

This weekday effect is mostly driven by the different numbers of tests administered. This is actually already included in the final model when we take testing frequency into account (I only mention that part briefly but the NB has more info). The effect of that is the smoothing you’re talking about.

1 Like

Yes, exactly, the random walk is modeling the changes in people’s behavior, policy etc which all affect the effective reproduction rate.

It does so where it has data. The further you forecast, the more uncertainty you will have in the reproduction factor.

What you mention about including things like social distancing policies is a great idea. You could add this (and other factors) as covariates to the model, like in a regression model and then use that for prediction.


Nice complete talk thanks Thomas! Knowing your challenges(!) I looked at your notebook and it seems like you fixed the delay model with an exposure clipping prior, it all worked really well for me :slight_smile: :ok_hand:

Slightly off topic, my question is when trying to apply a random walk model into a new dataset, with Data[], it recalls the shape and mu of the random walk steps. But I want to relearn these with the new random walk path, on the fly as it were, as otherwise the old data biases the new data. I guess I would like the random walk to be a distribution I can sample from as a transient to learn other model parameters and posterior, but that the walk steps are not embedded into the model (mu_0, mus_1 … etc), but I can’s see a way to do that, and grw does not have a dist, random sample method so can’t use it like that. Seems I have to set the new data as ‘missing data’ in the model and run the model again, so it imputes the out of sample data. Hope this makes sense! Anyway, great talk, very useful, and great conference :smile:

Thanks Marcus.
I don’t quite understand, do you talk about prediction?

I think the GRW has a .random() method, no?

In any case, you can always do pm.Normal().cumsum() to create a random-walk manually.
Not sure that helps but I didn’t quite understand the question.

1 Like

thanks Thomas,
yes - I am trying to do prediction with a pre-trained model which uses GRW but I do not want to persist the random walk steps into the model state as they are just transitory in calculating my principal model parameters. You could say I am just using the GRW for smoothing.

Yes, the GRW has a random() method - I was looking at the multivariate GRW so got that wrong :slightly_frowning_face: Thanks for the tip about cumsum() - that sounds great, I shall investigate further.


I cannot find the notebooks on your github repo.
Is it still available somewhere?

Regards Hans-Peter

Here you go:

Very much appreciated!
Thank you.

And the 1 million dollar question.
Did you find out what caused the model to go awry at the last step?

Ha, no I haven’t. Internet fame points if you do and post it here :).