How to deal with missing values when modeling human reinforcement learning data

const · June 11, 2025, 8:31am

Dear all,

I’m new to PyMC and have limited experience in cognitive modeling.

Recently, I’ve been working with human reinforcement learning data. In some trials, participants may time out, resulting in missing data for those trials. I’m wondering how to handle such situations.

Below is a portion of the hierarchical model code I’m testing:

with pm.Model(coords=coords) as hbandit:
    # ----- hyper‑priors (non‑centred) -----
    mu_alpha = pm.Normal("mu_alpha", 0.0, 1.0)
    sigma_alpha = pm.HalfNormal("sigma_alpha", 1.0)
    mu_beta = pm.Normal("mu_beta", 0.0, 1.0)
    sigma_beta = pm.HalfNormal("sigma_beta", 1.0)

    # ----- individual parameters -----
    alpha_logit = pm.Normal(
        "alpha_logit", mu=mu_alpha, sigma=sigma_alpha, dims="subject"
    )
    beta_log = pm.Normal("beta_log", mu=mu_beta, sigma=sigma_beta, dims="subject")

    alpha = pm.Deterministic("alpha", pm.math.sigmoid(alpha_logit), dims="subject")
    beta = pm.Deterministic("beta", pm.math.exp(beta_log), dims="subject")

    # ----- observed data -----
    a_data = pm.Data("actions", actions.astype("int64"))
    r_data = pm.Data("rewards", rewards.astype("int64"))

    # ----- trial‑by‑trial scan -----
    def step(a_t, r_t, Q_prev, alpha, beta, n_subj):
        logits = beta[:, None] * Q_prev  # shape (N,2)
        p = pm.math.softmax(logits, axis=1)
        logp = pt.log(p[pt.arange(n_subj), a_t])  # (N,)

        # chosen Q update
        idx = pt.arange(n_subj)
        q_chosen = Q_prev[idx, a_t]
        q_new = q_chosen + alpha * (r_t - q_chosen)
        Q_next = pt.set_subtensor(Q_prev[idx, a_t], q_new)

        return Q_next, pt.sum(logp)  # scalar log‑likelihood at t

    Q0 = pm.math.zeros((n_subj, 2))
    (_, ll_seq), _ = scan(
        fn=step,
        sequences=[a_data.T, r_data.T],
        outputs_info=[Q0, None],
        non_sequences=[alpha, beta, n_subj],
    )

    pm.Potential("likelihood", pt.sum(ll_seq))

    idata = pm.sample(
        1000,
        tune=1000,
        chains=4,
        target_accept=0.9,
        nuts_sampler="nutpie",
        progressbar=True,
        random_seed=42,
    )

Thank you!

Best,
Kun

bob-carpenter · June 16, 2025, 3:25pm

What is the cause of the timeouts? That will inform how you want to model them.

Are they given a task to complete in a given time and fail to complete it in the given time? Would they have completed given more time? If so, then you can use a censored model. This is common in survival models where you measure when someone dies, but they may still be alive at the end of the trial, so you know their death will be some time after the trial.

Or is it something like a survey question where you ask them their income and they just fail to respond? In that case, you want to take the non-response as information in and of itself if non-response is related to the outcome of interest.

You may also run into identifiability issues with your parameterization of a simplex in terms of softmax of unnormalized log densities.

const · June 17, 2025, 3:10am

Thanks for your helpful reply. In my task, participants must respond within 2 s; if they do not, that trial is recorded as a missing response. A few participants show some misses (typically 0–8 % of their trials, and most have almost none).

At the moment I handle these trials using pt.switch with a response mask to skip their update to Q value & likelihood. The model runs, but I’m not sure whether this is theorically/statistically sound.

Your suggestions sound more advanced. Could you please walk me through them in a bit more detail? I’d also appreciate any pointers to introductory resources so I can learn more about these concepts.

Thanks again for your help!

bob-carpenter · June 17, 2025, 5:15am

It’s usually better to model how data came to be missing than to throw it out.

One standard way to do this is with survival analysis.

The basic idea is that if you observe t[n] for a reaction time for participant n and assume that they would react if given more time, then if you have a model for reaction time of lognormal(mu, sigma), just to take an example, then if you have a point t[n] = 2s then rather than taking t[n] ~ lognromal(mu, sigma), you have to increment the log density with the log of complementary cdf at 2s—this is the probability that t[n] > 2 for a lognormal(mu, sigma). I’m not sure how easy that is to do in PyMC, but an alternative is to define a new parameter t_true[n] with a constraint that t_true[n] > 2 and then just take t_true[n] ~ lognormal(mu, sigma)`. I also don’t know how easy that is. The tutorial should provide some guidance.

If, on the other hand, this isn’t a censoring problem and it’s a “refuse to answer” problem, then you want to build a model of the probability of refusing to answer given other covariates and/or the outcome. This lets non-answering be informative.

const · June 18, 2025, 2:25am

Thank you! I will check the tutorial and test it first.

Topic		Replies	Views
Is this a censored-data problem or not? version agnostic	6	269	October 27, 2023
Modeling reinforcement learning of human participant using PyMC3 Questions	8	3193	June 7, 2023
Modelling censored data in pymc v4: a simple Gaussian random effects example v5 development , modeling	0	387	August 30, 2022
Could Someone Give me Advice for Handling Missing Data in Bayesian Modeling with PyMC? v5 theano , modeling	1	119	January 31, 2025
Translating Censored Data example from PyMC2 to PyMC3 Questions	2	706	July 27, 2017

How to deal with missing values when modeling human reinforcement learning data

Related topics