Hi everyone,
My name is Justin Choi, a recent graduate of WashU. I currently work for the Cleveland Guardians as an Associate Data Scientist. I’ve been an avid user of PyMC for about a year now, and am a beginner with open source development, so this year’s GSoC seemed like an opportunity to not only hone my own skills, but also give back to a community that I’ve grown to appreciate. Projects that I’ve completed using PyMC include a pitch location optimization framework, a player aging curve with a custom Heckman correction likelihood, and pitch quality evaluation using BART and random effects.
Beyond baseball, I also have an interest in spatial statistics – one of my former professors, Joe Guinness, is relatively well known for his contributions to approximate GPs and got me hooked. I actually was planning on implementing the Nearest-Neighbor GP on my own before I found out that it was one of the recommended topics! I spent this week trying to (1) implement it using a combination of NumPy and SciPy and (2) validating the results against a full GP. Thankfully, the log-likelihoods lined up. The notebook can be found here.
Admittedly, I’m not yet comfortable with the mechanisms of PyTensor. If I do get accepted into GSoC, I plan to spend the first few weeks becoming extremely familiar with them, then converting my existing NumPy-oriented code into PyTensor. Once that’s taken care of, I believe that getting the NNGP implemented into PyMC will be relatively straightforward, though of course there’ll be unexpected roadblocks along the way.
The main thing that’s holding me back is the time commitment. Due to my job with the Guardians, I won’t be able to spend 350 hours across the busy summer months. I strongly prefer a 175-hour commitment, but wanted to make sure that was possible before submitting a proposal. If not, I’m also happy to work on the NNGP on my own and ask the community for review once finished. Thanks in advance!