I am a final year master degree student studying statistics, I also studied computer science and statistics during my bachelor degree. I was lucky to have a glance on Bayesian statistic during my study and then got fascinated by this field. Afterwards, I tried to build some model using PyMC and numpyro, was also amazed by these tools which simplified the modeling procedure a lot. I think GSoc would be a great opportunity for me to contribute and learn more. Now I am looking for suitable topic to draft a proposal, it would be great if any of you could give me some advices.
To give you some sense of my familiarity, I have used PyMC to build a change point model hierarchy-bayesian-modeling-time-series-sensor/demonstration.ipynb, NumPyro with JAX to build an imputation model for questionnaire survey and several gaussian process survival models for study.
Basically, My general thought is to build something great, consolidate my statistical knowledge, solve complicated problems, and learn something new, powerful and practical. And I would like to keep contributing after GSoC.
I am learning JAX currently, it surprised me by its speed. I also read the post MCMC for big datasets: faster sampling with JAX and the GPU - PyMC Labs (pymc-labs.io), it seems that it is possible to use JAX in PyMC. Is this project going to speed up sampling by JAX? The description states “There are now 2 JAX samplers available to be used in sampling_jax.py. It would be nice to add these to the standard sample() method.”, but I don’t see sampling_jax.py by searching the repo, may I have a brief idea of what this is and what are needed to be done?
The idea seems quite interesting, and I like to work on complicated task. It said on the requirement that “People working on this project will need to be familiar with the Projection predictive methodology”, but I have not heard of it before. Do you think that there are enough time for me to pick this up in the coming days before application? Or I should just give up this?
For your information, I am also looking into following topcs, but I am still studying and reading the related issues, no comments or thoughts right now. Needa study more.
Add better support for time-series models
Fast Exact Gaussian Processes
Automatic marginalization of discrete variables
Support for more types of Mixtures
RV convolutions (marginalization of some continuous variables)
Increase support for batched multivariate distributions