I’d love to receive feedback on a project proposal for pymc3 for GSoC 2018. The idea came in a conversation with Thomas Wiecki last week about a summer project regarding cognitive neuroscience modeling using hierarchical Bayesian methods (and inspired by the existing GSoC project for ABC). This is the project abstract:
HDDM is a pymc2-based Python library for hierarchical Bayesian parameter estimation the Drift Diffusion Model, a model used in cognitive neuroscience model to study decision making. After conversations with Michael J. Frank and Thomas Wiecki, we came up with the following project proposition, intended to migrate HDDM to pymc3 to support its continued development and extend its capabilities. As I envision the process, if I document it sufficiently thoroughly, my work could also serve as a guide for domain experts in other (not statistics) disciplines to implement custom likelihoods and toolboxes for recurring problems in their fields. I then intend to begin researching and implementing ABC methods as part of pymc3, with an eye towards methods useful in the sort of problems encountered in cognitive neuroscience, such as hierarchical model fitting and regression between neural correlates and parameters.
The full proposal is available both on Github and on Google Docs, along with a longer description of my background and publically-available code samples.
It will be a great contribution! I still use HDDM and would love to sample it with more efficient sampler.
A few comments:
We already have a GSOC project implementing ABC. I think we should coordinate this re the algorithm to be implemented cc @agustinaarroyuelo@aloctavodia
I guess you will also rely on pasty (similar to glm module and Bambi)?
Absolutely. I would hate to duplicate efforts. I think the proposal mentioned ABC-SMC. Is that the current direction?
I imagine I would. From my (admittedly inexperienced) perspective, I might like to keep it behind the scenes for the simplest use cases, and expose it to more advanced ones? That appears to be the general design philosophy, although @twiecki would know much more.
Makes sense. I’ve never played with it, when I worked with Stan it was directly through pystan/rstan.
1, Yes, SMC is the main algorithm, with reject sampler as a baseline implementation. What kind of ABC algorithms are there specific for HDDM?
2, I think that’s should be the general approach, although pasty is not as powerful as the ones in R (ie, the one that brms use). I am not sure what is the best route for this, but another option is to have predefined model with input as theano.shared variable, and we set the value of the shared variable to the user input before inference
In this case, since we have the likelihood, we can first see how efficient NUTS is. I would suggest you modify the ABC session in your proposal to mention benchmarking different inference algorithm and potentially implement the additional algorithm (like Gibbs ABC mentioned in the above reference).
Of course, benchmarking sounds like the right way to begin. Part of the appeal of ABC methods is that they would eventually allow expanding HDDM beyond the DDM and into alternative models for 2AFC tasks (or other decision-making tasks) without an analytical likelihood.
I’ll make that change shortly. Anything else you’d suggest changing in or adding to the proposal?
I recently examined a PhD thesis in fact which used all kinds of crazy workarounds to do Bayesian inference on temporal accumulation models with no analytical likelihood. A whole bunch of cognitive science would become easier if this were doable in a more automatic fashion in PyMC3. I see the focus is on HDDM, but presumably the core functionality would enable ABC on a range of different model classes?
Yes but that is another GSOC 2018 project. The plan is to extend the SMC sampler into an ABC-SMC algorithm. This project (if accepted) will be done by @agustinaarroyuelo supervised by @aloctavodia and me.
@drbenvincent, I think our goal indeed would be to enable fitting different model classes, between this project and the other one @junpenglao mentioned.
I imagine we’ll try to have it formulated in a fashion where code specific to DDM (and very similar models) sits in HDDM and generic code bubbles it’s way to PyMC3, but we’ll be wiser about it in a few months (assuming the project is accepted).
Agree. I think the proposals should not overlap. But, if everything runs smooth with both proposals (HDDM and ABC-SMC) we could try to make both of them to work together, either as part of the final stage of GSoC or after the GSoC.
@aloctavodia I agree as well, trying to make them converge towards the end of the summer or the fall makes sense.
@junpenglao I added a more explicit elaboration of benchmarking ABC approaches with the DDM to the project proposal. Anything else that might be worth mentioning?