Hello Pymc Community!
I wanted to share a project I’ve been working on: Gumbi, the Gaussian Process Model Building Interface. Inspired by what Bambi does for generalized linear models, the goal of Gumbi is to make it easy and intuitive to quickly prototype Gaussian Process models. Specifically designed for tabular data (pandas DataFrames), Gumbi takes care of transforming, standardizing, and reshaping data before feeding it into an auto-built Pymc model, and then un-doing all of that wrangling when it comes to making predictions. Gumbi also includes visualization tools for easy plotting of predictions. For now there is only a Pymc backend, but I may add a GPFlow backend as well.
As shown in the quickstart, Gumbi reduces building, fitting, predicting, and visualizing a model down to a few simple commands.
Read in some data and store it as a Gumbi
DataSet
:import gumbi as gmb import seaborn as sns cars = sns.load_dataset('mpg').dropna() ds = gmb.DataSet(cars, outputs=['mpg', 'acceleration'], log_vars=['mpg', 'acceleration', 'weight', 'horsepower', 'displacement'])
Create a Gumbi
GP
object and fit a model that predicts mpg from horsepower:gp = gmb.GP(ds) gp.fit(outputs=['mpg'], continuous_dims=['horsepower']);
Make predictions and plot!
X = gp.prepare_grid() y = gp.predict_grid() gmb.ParrayPlotter(X, y).plot() sns.scatterplot(data=cars, x='horsepower', y='mpg', color=sns.cubehelix_palette()[-1], alpha=0.5);
Gumbi can handle multiple continuous input dimensions, multiple categorical input dimensions (correlated via a Linear Model of Coregionalization), and multiple outputs (also via LMC), along with Normal, Log-Normal, and Logit-Normal variables. Right now, only Marginal and MarginalSparse (DTC) implementations are supported, but I hope to add Latent variable support to allow for classification, heteroskedasticity, and other likelihood structures.
There’s still a lot of room for improvement, but the basic structure and functionality is in place. It’s available on PyPI, and I’ll try to put it on conda-forge soon. Take a look at the docs, which has thorough API documentation and example notebooks. I plan to add more explanations of some of the underlying data structures, and please let me know if anything is unclear!
I’m still developing the package, and contributions are welcome! This is my first open-source project, so bear with me while I figure it out as I go. I’m very open to suggestions on everything from API structure to inner workings and git guidelines. There are many features I still plan to implement, some of them obvious, so please open issues requesting specific features as this will help me prioritize.
And finally, let me know what you make with it! I’m excited to see how it does out “in the wild”.