Modeling human response time data with an exponential model: model comparison issues

Hi everyone!

I wanted to build with some specific examples to @AlexAndorra’s answer.

I used this same model to perform some experiments on handling inference criteria calculations on models with multiple observations with ArviZ, I uploaded it on GitHub some days ago, I wanted to finish the work and add explanations to each of the steps but could never find the time. Today I dedicated some time to this other answer which I find is quite similar to this situation and also incorporates a hierarchical structure (which adds some extra possibilities to the mix).

I would recommend first reading the other discourse thread, however, as it is quite long, if you don’t have the time or the motivation, I copied below the main takeaway (from what I have gathered in other discourse threads and GitHub issues):

When working with several observed variables, there is not a single way of computing waic/loo nor or performing cross-validation.

As said above, from the ArviZ side, we’ll try to simplify the workflow, however, there is no unique way to do the computation, it will still require users to define explicitly what predictive task they are interested in.

Disclaimer: I have not checked the exchangeability criteria of the examples below. I use them to illustrate only ArviZ+xarray usage, not statistical correctness.


The GitHub link above goes to a folder with several files in it:

  • pymc3_example shows 4 possibilities I came up with without any domain knowledge about the model, some of them may make no sense or may be incorrect (see disclaimer!). I hope the code itself is self explanatory enough.
  • pystan_example is purely a translation of the pymc3_example but only with the exponential model (and therefore no model comparison either). If you are interested in information criteria, it can be interesting to read. I hope it can be followed even tough it is written in Stan, I tried to make the model as similar as possible as the PyMC3 one. I think it will be useful to people not familiar with Stan as a step before going into the next two notebooks.
  • pystan_exact_loo_subject and pystan_exact_loo_observation are comparisons between the PSIS loo obtained by ArviZ after combining the pointwise log likelihood values and exact cross-validation. If you want to play with them, keep in mind that the model is refitted one time per excluded observation.

In this example, the results of leave one observation out and leave one subject out is extremely similar, which can happen, but do not rely on the two values being similar for any model.

3 Likes