Choosing an appropriate model for reaction times

Thanks for the suggestion. Unfortunately, this approach doesn’t work for me. Maybe I’m parametrising something wrongly, but I cannot find an explicit parametrisation for the likelihood in the paper you sent me. So, just to try out, I applied a pm.ExGaussian to the model:

with pm.Model() as mod:
    
    alpha_m = pm.Normal('alpha_m', 0, 1)
    alpha_s = pm.HalfNormal('alpha_s', 1)
    alpha_t = pm.Normal('alpha_t', 0, 1, shape=conds)
    alpha = pm.Deterministic('alpha', alpha_m + alpha_t*alpha_s)  
    
    beta_m = pm.Normal('beta_m', 0, 1)
    beta_s = pm.HalfNormal('beta_s', 1)
    beta_t = pm.Normal('beta_t', 0, 1, shape=conds)
    beta = pm.Deterministic('beta', beta_m + beta_t*beta_s)  

    mu = pm.math.dot(inter, alpha) + pm.math.dot(slope, beta)*Worry
    sigma = pm.Exponential('sigma', 5)

    y = pm.ExGaussian('RT', mu=mu, sigma=sigma, nu=6, observed=RT)

My guess was that this would probably break the model, as the interactions present in the model may cause some conditions to get ceiling effects or very few observations. Effectively, the models break due to bad energy issues. This happens either using untransformed and transformed RTs. But, again, I’m probably applying this wrongly.

Even so, my impression is that many of these approaches work very well in theory (e.g. in simulations), or with very ad-hoc data. In this case with a stop-signal paradigm. Instead, I’m using a stimulus recognition paradigm, were participants have to answer every trial (i.e. press left or right), thus many misses or false-alarm responses are not included into the analysis, because the interest is only on RTs for correct responses (I guess alternatives may be argued). For instance, in the paper you link, they have a number of observations between 2000 to 3000, without a complex interaction, and that seems to work neatly. They also point out insensitivity to priors. In my case, the models are very sensitive to priors, and I need over 9000 to 10000 observations for the models to get less sensitive (e.g. using an exponential hyperprior for the StudentT greatly shifts the intercept downwards, but a fixed prior does not).

In any case, it is a very interesting discussion. As far as I understand, the modelling of RTs is an ongoing issues. I’m not sure whether the ExGaussian approach is taken by many or few, but it’s certainly one between several approaches (they also mention this in the paper). Personally, I lack the expertise to theoretically address this issue so it also works in practice.

At the moment, @ricardoV94 's suggestion of using the standardised log of RTs with a StudentT likelihood is what works best for me. I also agree with his suggestion that the Lognormal over untransformed RTs may not be working due to issues with priors. However, I’m using weakly informative priors for both intercept and slopes, i.e. very conventional normal distributions in a non-centered parametrisation. No too far from the aforementioned paper, where normal priors are the choice for the hierarchical model they implemented. So, I remain slightly puzzled about the Lognormal or ExGaussian causing issues, but the StudentT with standardised log RTs working so well. Thanks again for the suggestions.