GSoC 2026: Interest in Scalable Online Bayesian State Space Models Project

Hi Everyone,
My name is Shraddha Sharma. I graduated with a Master’s degree in Robotics and Artificial Intelligence from Nanyang Technological University, Singapore in 2020.

I am writing to express my interest in the project - Scalable Online Bayesian State Space Models. This project caught my attention because I found that solving this problem would enable me to solve a longstanding problem of my interest that came out of my masters’ thesis.

My thesis work involved developing a singing assistance tool for beginners (motivated by my own experience while learning singing), wherein I published this work. After graduating, I worked on various research projects analyzing time-series signals across different domains, and later worked as a Data Scientist in a digital therapeutics company analyzing electrocardiogram (ECG) signals. A few months back, I went to the Recurse Center, a programming retreat in New York, to further develop the singing assistance tool from my masters’ thesis. It was there that I started utilizing Bayesian inference approach for my audio time-series signal (my masters’ work had originally utilized classical signal processing methods). Here, I found out about Stan and PyMC for probabilistic inference through one of the Recursers and I started developing Bayesian Hidden Markov Model for analyzing audio signal.

This month I found out that PyMC has been participating as a GSoC organization (through NumFocus). And I was happy to find that PyMC GSoC community appreciates prospective contributors sharing their PyMC usecases through notebook. Here, is my singing assistance tool notebook using PyMC for probabilistic inference of the audio states (for now it infers two states: silence, singing). Currently, it lacks temporal information for which I tried to implement naive Bayesian Hidden Markov model however as expected due to strong dependency chain, a huge PyTensor graph is forming, making it inefficient. Then, I realised that pymc-extras has flexible state space model implementations that could help me address the temporal aspect, and I am currently exploring these.

Parallely, when I went through the project ideas that PyMC have listed, the Scalable Online Bayesian State Space Models project caught my attention as this is the problem I am trying to solve. The first thought that came to my mind is it will enable the singing tool to solve strong dependency chain issue, scale model efficiently, and provide real-time feedback as the user sings. I was happy to discover this idea at a time when I was actively looking for ways to address the limitations of my singing tool. Currently in the PyMC framework, Markov Chain Monte Carlo (MCMC) sampling is happening on the full model as the new data arrives, however it does not allow the model to scale for online inference. As mentioned in the description of Scalable Online Bayesian State Space Model project, I plan to implement marginalisation of latent states, and structured linear algebra to solve this problem.

Lastly, I would like to mention that recently I have been studying Randomized Algorithms which I would like to utilize for solving the scalability issue through this project, specifically the Frievald’s algorithm to solve the matrix operations involved in the project.

Thank you for your time.

Best Regards,
Shraddha Sharma