Dimensions for y variable in gp.marginal_likelihood

I’m working on matching color space transforms between cameras with RGB data from two cameras. The two cameras shot the same color charts and I have around 1000 samples. I tried defining X as camera A (1008,3) and y as camera B (1008,3), but I received an error:

ValueError: Input dimension mis-match. (input[0].shape[1] = 3, input[1].shape[1] = 1008)

When I define y as a single channel (1008,1) it runs. Can the y variable have multiple columns in gp.marginal_likelihood?

Currently no… but really trying to get this in the next major release. In the meantime this might be helpful.

If you’re comfortable experimenting a little, I’ve written a package that is intended to make this sort of wrangling easier. It’s here: GitHub - JohnGoertz/Gumbi: Gaussian Process Model Building Interface. It uses pymc3 under the hood, but aims to take care of all the reshaping, transforming, and standardization needed.

I’ve been working on it for a while, but I’ve just started making it public, uploading it to PyPI, hosting the documentation on ReadTheDocs, etc. I feel it’s well documented, but you need to compile the documentation locally at the moment. You can also just take a look at the few example notebooks: Gumbi/docs/source/notebooks at main · JohnGoertz/Gumbi · GitHub

In your case, you would store your data as a single tall DataFrame df with one column for “pixel” (0-1007, or whatever is appropriate), one column for “camera” (‘A’ or ‘B’), one column for “channel” (‘R’/‘G’/‘B’), and one for “value” (probably best to normalize between 0 and 1, noninclusive).

The steps with Gumbi would then just be:

ds = gmb.DataSet(df, outputs=['value'], logit_vars=['value'])  # Omit logit_vars if your data is not normalized to (0, 1)

gp = gmb.GP(ds)

# You may need to pass `sparse=True` for your 6k datapoints. 100 inducing points are the default, otherwise specify with `n_u`.
gp.fit(outputs=['value'], continuous_dims=['pixel'], categorical_dims=['camera', 'channel'], sparse=True)

# Make predictions for each combination of camera/channel and plot
X = gp.prepare_grid()
axs = plt.subplots(2,3, figsize=(18, 8))[1]
for row, camera in zip(axs.T, ['A','B']):
    for ax, channel in zip(row, ['R','G','B']):
        y = gp.predict_grid(categorical_levels={'camera': camera, 'channel': channel}, with_noise=False)
    
        gmb.ParrayPlotter(X, y).plot(ax=ax)

That setup will use a coregionalization kernel to learn correlations between all combinations of camera+channel, inspired by this other implementation by @bwengals.

The package is still in development, so let me know if you need any help or have any suggestions for improvement!

2 Likes

On second thought, I realized I misunderstood your system. More likely what you want is to build your dataframe as one column for “sample” (0-1007, if you want, we won’t actually use this), one column for “channel” (‘R’/‘G’/‘B’), one column “cameraA” and one for column for “cameraB”. The Gumbi model becomes

ds = gmb.DataSet(df, outputs=['cameraB'], logit_vars=['cameraA', 'cameraB'])  # Omit logit_vars if your data is not normalized to (0, 1)

gp = gmb.GP(ds)

# You may need to pass `sparse=True` for your 6k datapoints. 100 inducing points are the default, otherwise specify with `n_u`.
gp.fit(outputs=['cameraB'], continuous_dims=['cameraA'], categorical_dims=['channel'], sparse=True)

# Make predictions for each combination of camera/channel and plot
X = gp.prepare_grid()
axs = plt.subplots(1,3, figsize=(18, 4))[1]
for ax, channel in zip(axs, ['R','G','B']):
      y = gp.predict_grid(categorical_levels={'channel': channel}, with_noise=False)
  
      gmb.ParrayPlotter(X, y).plot(ax=ax)

Now you’re predicting the cameraB value from the cameraA value with correlations across channels.

Update: the Gumbi docs are now live and it’s installable from PyPI.

2 Likes