Introduction: GSoC 2026 Contributor — Diego

Introduction: GSoC 2026 Contributor — Diego Medina

Hi everyone! I’m Diego Medina, a Data Science student at ESCOM (Instituto Politécnico Nacional) in Mexico City. I’m interested in contributing to PyMC for GSoC 2026.

Background

I’m currently pursuing my degree in Data Science (expected graduation 2027) with a focus on machine learning, NLP, and probabilistic programming. I have practical experience building full-stack AI applications using Transformers (Hugging Face), Scikit-learn, PyTorch, and Flask/React. I also have experience with scientific Python libraries like NumPy, Pandas, and SciPy.

Before switching to Data Science, I studied Computer Engineering at UPIICSA-IPN, which gave me a solid foundation in algorithms, data structures, and software architecture.

Open Source Contributions

  • PyMC — PR #7992: Expanded the RandomWalk class docstrings with mathematical formulation, parameter descriptions, and NumPy docstring standard formatting. Currently in review.
  • PyMC — PR #8060: Clarified the import path for as_xtensor in the dims error message (issue #8009). The fix was merged via a related PR.
  • SciPy — PR #24146: Improving the DIA sparse format documentation on diagonal alignment and data mapping. In review with positive feedback from maintainers, targeting the 1.18.0 milestone.

I’m actively looking for more issues to contribute to in PyMC and PyTensor.

Interests

I’m exploring the project ideas list to find the best match for my skills. Given my background in data science, ML, and NLP, I’m particularly interested in projects that involve statistical modeling and Bayesian inference. I’d appreciate any guidance from mentors on which projects could benefit from my experience.

I’m committed to meaningful contributions to PyMC both during and beyond GSoC!

GitHub: Diegomed11

Hi Diego

Nice to meet you, glad to hear you’re interested in GSoC.

For my part, I think it would be more interesting to see you use PyMC or pytensor to work on problems that you are interested in. Have you ever worked with Bayesian models? You could fit a hierarchical model on some data that you personally find interesting and share the results. If you’re interested in NLP, you could do something like infinite shakespere using pytensor.

I recognize that GSoC asks for PRs and contributions, but if you don’t have any hands-on experience with our ecosystem, I think it would be more beneficial for you to spend some time playing with it. I would definitely not hold it against your application if you did that instead of PRs.

Hi Chris, thank you for the guidance. I’m going to build a Bayesian survival model using the mastectomy dataset — Weibull AFT with censored observations via pm.Potential, and compare the posteriors against lifelines’ frequentist estimates. I’ll share the notebook here in the next few days.

Hi Diego, please read GSoC 2026: How to Get Involved with PyMC — Please Read Before Posting - #2

Hi Jesse (apologies for getting your name wrong earlier!), and thanks aloctavodia for pointing me to the GSoC guide — very helpful.

I built the Bayesian Weibull AFT model I mentioned. I used the mastectomy dataset, implemented the censored likelihood with pm.Potential, and compared the posteriors against lifelines’ frequentist estimates.

One thing I found interesting: the frequentist model gives p=0.08 for the metastasis effect — not significant at 5%. But the Bayesian posterior gives a 97.2% probability that metastasis reduces survival time. With only 44 patients, the full posterior is much more informative than a binary significance test.

Here’s the notebook: https://github.com/Diegomed11/bayesian-survival-pymc

I’m now working on my proposal draft for the Survival Models project and will send it to the GSoC email soon