Dimensions for y variable in gp.marginal_likelihood

Remanuel · November 3, 2021, 3:07pm

I’m working on matching color space transforms between cameras with RGB data from two cameras. The two cameras shot the same color charts and I have around 1000 samples. I tried defining X as camera A (1008,3) and y as camera B (1008,3), but I received an error:

ValueError: Input dimension mis-match. (input[0].shape[1] = 3, input[1].shape[1] = 1008)

When I define y as a single channel (1008,1) it runs. Can the y variable have multiple columns in gp.marginal_likelihood?

bwengals · November 4, 2021, 6:26pm

Currently no… but really trying to get this in the next major release. In the meantime this might be helpful.

BioGoertz · November 25, 2021, 3:55pm

If you’re comfortable experimenting a little, I’ve written a package that is intended to make this sort of wrangling easier. It’s here: GitHub - JohnGoertz/Gumbi: Gaussian Process Model Building Interface. It uses pymc3 under the hood, but aims to take care of all the reshaping, transforming, and standardization needed.

I’ve been working on it for a while, but I’ve just started making it public, uploading it to PyPI, hosting the documentation on ReadTheDocs, etc. I feel it’s well documented, but you need to compile the documentation locally at the moment. You can also just take a look at the few example notebooks: Gumbi/docs/source/notebooks at main · JohnGoertz/Gumbi · GitHub

In your case, you would store your data as a single tall DataFrame df with one column for “pixel” (0-1007, or whatever is appropriate), one column for “camera” (‘A’ or ‘B’), one column for “channel” (‘R’/‘G’/‘B’), and one for “value” (probably best to normalize between 0 and 1, noninclusive).

The steps with Gumbi would then just be:

ds = gmb.DataSet(df, outputs=['value'], logit_vars=['value'])  # Omit logit_vars if your data is not normalized to (0, 1)

gp = gmb.GP(ds)

# You may need to pass `sparse=True` for your 6k datapoints. 100 inducing points are the default, otherwise specify with `n_u`.
gp.fit(outputs=['value'], continuous_dims=['pixel'], categorical_dims=['camera', 'channel'], sparse=True)

# Make predictions for each combination of camera/channel and plot
X = gp.prepare_grid()
axs = plt.subplots(2,3, figsize=(18, 8))[1]
for row, camera in zip(axs.T, ['A','B']):
    for ax, channel in zip(row, ['R','G','B']):
        y = gp.predict_grid(categorical_levels={'camera': camera, 'channel': channel}, with_noise=False)
    
        gmb.ParrayPlotter(X, y).plot(ax=ax)

That setup will use a coregionalization kernel to learn correlations between all combinations of camera+channel, inspired by this other implementation by @bwengals.

The package is still in development, so let me know if you need any help or have any suggestions for improvement!

BioGoertz · November 25, 2021, 4:08pm

On second thought, I realized I misunderstood your system. More likely what you want is to build your dataframe as one column for “sample” (0-1007, if you want, we won’t actually use this), one column for “channel” (‘R’/‘G’/‘B’), one column “cameraA” and one for column for “cameraB”. The Gumbi model becomes

ds = gmb.DataSet(df, outputs=['cameraB'], logit_vars=['cameraA', 'cameraB'])  # Omit logit_vars if your data is not normalized to (0, 1)

gp = gmb.GP(ds)

# You may need to pass `sparse=True` for your 6k datapoints. 100 inducing points are the default, otherwise specify with `n_u`.
gp.fit(outputs=['cameraB'], continuous_dims=['cameraA'], categorical_dims=['channel'], sparse=True)

# Make predictions for each combination of camera/channel and plot
X = gp.prepare_grid()
axs = plt.subplots(1,3, figsize=(18, 4))[1]
for ax, channel in zip(axs, ['R','G','B']):
      y = gp.predict_grid(categorical_levels={'channel': channel}, with_noise=False)
  
      gmb.ParrayPlotter(X, y).plot(ax=ax)

Now you’re predicting the cameraB value from the cameraA value with correlations across channels.

BioGoertz · November 29, 2021, 11:55am

Update: the Gumbi docs are now live and it’s installable from PyPI.

Topic		Replies	Views
The predict function of GP Questions	10	1227	June 15, 2021
Can you provide some examples of multivariate gaussian process models in pymc3 Questions	0	362	April 21, 2020
Multidimensional input using Gaussian Process Questions	6	3999	June 28, 2017
GP.conditional.Input dimension mis-match Questions	5	494	September 18, 2021
Introducing Gumbi: the Gaussian Process Model Building Interface Sharing gaussian_process	13	1876	November 15, 2022

Dimensions for y variable in gp.marginal_likelihood

Related topics