Hi all. I’m a lurker in this community and just noticed this nice discourse tool. This look like a great forum.
I’ve been looking at the Machine Learning frameworks and the state of array computing in Python and it looks like we have had some wonderful spreading of capability over the past several years.
Meanwhile, a few of us have been working on the foundations so that a better system could emerge at some point. We have made a lot of progress in Numba, Dask, and more recently XND. There is also the interesting work around CuPy, Bohrium, and other GPU arrays. In addition, the general automatic differentiation story has been reifying.
Of course at the same time, Tensorflow, MXNet, PyTorch, and a few others have also defined ndarray concepts of their own — typically interfacing with NumPy and it’s more rigid type-system and writing a new function system outside of NumPy’s “ufunc” system.
In my mind, it is a good time for a new low-level “ufunc” system for array objects in Python as well as a refactoring of the capabilities of NumPy into lower-level components that can be re-used by things like PyTorch, Pyro, and other high-level systems.
I can see an opportunity, for PyMC4 to be a direct “customer” of the work we are planning to do on combining Numba + CuPy + Tangent along with XND to provide a more flexible array-container concept.
XND is not quite ready for even alpha-level consumption (we have docs to write and more ufuncs to build – but the bones are all there now if you want to take a peak: https://github.com/plures (the name comes from ex uno plures “from one comes many” which is a play on inverting e pluribus unum). The name emphasizes the idea that plures is about refactoring the capabilities of NumPy into C-libraries with Python interfaces that can then be re-used by many other systems. But, we will be rolling out the idea in a few months under the “xnd” brand which is the generic container that generalizes the NumPy container to things like variable-length arrays and is straightforward to extend with many-other kinds of data-types.
XND is Stefan Krah’s work product but he and I have been collaborating on the architecture for the past 2 years when I finally had an epiphany of how I felt the future of array computing should look like. And that future meant an expansion of the buffer protocol to multiple languages and a refactoring of NumPy into its core components (dtype system, ufunc system, array container)
We plan to make an early alpha release by May and then have a beta release by the end of the summer.
To summarize (and throw in one more idea) I have three separate suggestions:
-
PyMC4 could support NumPy + CuPy + Tangent as a framework to build on for the future with NumPy/CuPy arrays as the array object.
-
PyMC4 could also start to provide optional support for XND for data-types and features that are not otherwise available.
-
PyMC4 could support Dask for creating parallel workflows (if you look at how distributed Tensorflow is architected, for example, it looks very similar to Dask, except Dask’s Python API is arguably much cleaner). If other things from PyTorch are missing, then perhaps Chainer could be used as it uses NumPy for the array object and does not introduce another array concept (fortunately this time around we have an extended buffer protocol and so you can still share data between competing arrays).
Thanks for reading this far. We have a plures gitter channel if you want to drop by and say hi: https://gitter.im/Plures/xnd