New PyMCon Talk Released: Missing Value Imputation with Item Response Theory by Allen Downey & Ricardo Vieira

Hi Everyone :raising_hand_man:

Come to our next PyMCon Web Series! We’re talking about ‘Missing Value Imputation with Item Response Theory


  1. Allen Downey, Professor Emeritus at Olin College, and the author of Think Python , Think Bayes , Think Stats and other books related to computer science and data science.
  2. Ricardo Vieira, PyMC developer and data scientist at PyMC Labs

Event type: Recorded Talk with Live Q&A
Q&A Date/Time: 2023-12-15T15:00:00Z(subscribe here for email updates)
Register for Q&A: Meetup event (to get the Zoom link)
Website: PyMCon Events · PyMCon Web Series


In many large surveys, not every respondent is asked every question, and not every respondent answers the questions they are asked. So, how can we compare people who answer different sets of questions? One solution is to use item response theory (IRT) to impute missing responses—and nothing pairs better with IRT than Bayesian methods!

In this talk, we will report the results of a friendly competition—a bake-off—between two approaches to this problem: one using grid algorithms and a simplified model, the other using PyMC and a more detailed model. We’ll discuss the implementations, compare the results, and outline their pros and cons.


Async Talk:
Interview video:
Slides: Bayesian Bake-Off: Grids, MCMC, and IRT - Google Slides

Event Format:

Like other PyMCon events, this one features an asynchronous component along with a synchronous Q&A session. Stay tuned for the prerecorded talk; we will be sharing it soon.

There will be a live Q&A on December 15, 2023, at 7:00 pm PT. Register for the event and bring all your doubts to discuss there.

:pushpin: Here is the link:

About the Speaker:

  1. Allen Downey
    Allen is a curriculum designer at Brilliant and Professor Emeritus at Olin College, and the author of Think Python, Think Bayes, Think Stats and other books related to computer science and data science.
    He writes a blog about Bayesian statistics and related topics called Probably Overthinking It. And he is working on a book, also called Probably Overthinking It, that will be published by University of Chicago Press in 2023. If you would like to get an occasional update about the book, please join my mailing list.
    Dr. Mahmood is a neuroscientist with a PhD from Brandeis University, where he investigated the neural coordination of taste. His research, initially using electrophysiology to probe brain region interactions, hints at a complex network processing flavors. His forthcoming studies aim to unravel this network further, exploring the directional flow of neural information and the impact of feedback mechanisms in taste perception.

    :link: Connect with Allen:
    :point_right: Website:
    :point_right: LinkedIn:
    :point_right: Twitter:
    :point_right: Mastodon:

  2. Ricardo Vieira
    Ricardo Vieira is a PyMC developer and data scientist at PyMC Labs. He spent several years teaching himself Statistics and Computer Science at the expense of his official degrees in Psychology and Neuroscience.

    :link: Connect with Ricardo:
    :point_right: Website: Blog | As long as everything adds up to one
    :point_right: GitHub: ricardoV94 (Ricardo Vieira) · GitHub

1 Like

The full interview with Allen and Ricardo is now available on our YouTube channel, where they share insights into their statistical and PyMC journeys, along with some invaluable advice. :rocket:

:bulb: Check out the interview now, and stay tuned for their asynchronous talk tomorrow.

The Async talk and slides for this PyMCon event are now live on our YouTube channel!

Watch the Async Talk Now:

Explore the Slides:

In case you haven’t registered for the Q&A session yet, do RSVP now to get the Zoom link: :point_down:

Save the date, watch the talk, and come prepared with your questions on December 15th! Let’s make it an insightful and engaging session!

Question for the 12/15/2023 Q&A: I’d like to know your opinion about the feasibility of the following scenario. Imagine we have 100 students with test scores from Physics I, Physics II, and Nuclear Physics. We also have some demographic data. And 400 students with test scores from Physics I and Physics II. I want to predict the Nuclear Physics test scores for the 400 students using the methods presented in the async talk. Am I asking too much?

1 Like

This was the second question in the chat

I’m trying to predict the survival of ropes and I can measure things such as the force applied on the rope and time the rope is in service. However, one of the things that determines the life of the rope would be abrasions which I can’t measure. Can I input that missing information into a model or is that too much

Is anyone aware of public data sets that are well suited for use in exploring probabilistic IRT?

If you missed our live Q&A session the recording is now available on YouTube:

I found a near-ideal dataset in this paper! Essay scoring by multiple graders per student with many tens of thousands of data points.

1 Like