I want to share the first publication in which we use PyMC3 to deceiver the genome: CUT&Tag2for1: a modified method for simultaneous profiling of the accessible and silenced regulome in single cells We implemented a deconvolution tool called 2for1separator using PyMC3 to split a measurements throughout the genome into epigenetic marks that stand for activity or repression of genes.
This will help us to get a better understanding of the mechanisms that govern gene activation and fight diseases associated to malfunctioning of these mechanisms.
Technically, this application of PyMC3 represents an extrem case. Since the genome is so large, and we deconvolve a signal per base location, the dimensionality goes into the millions. A preparation step splits the genome into tiles that are deconvolved in independent workchunks to fit everything into memory. The posterior is normally distributed, so we resorted to using find_MAP giving us the fastest results. We are now working on bringing down the memory demand so we can fit more tiles into a workchunk and reduce the risk of parameters that should be the same for each tile to diverge to different values per workchunk. The majority of compute time is currently spent for compilation but this may change in the future if Numba or JAX can be used as a backend. I am very open to critical discussion about our application of PyMC.
I am very happy with this project and am immensely grateful for the awesome PyMC community!! Please keep up the great work. I cant wait to see what will be possible with PyMC v4 in future projects.