2D Gaussian Process: error generating posterior predictive samples

Hello!

I am new to PyMC (and getting reacquainted with Python after a long time of using R and Matlab for data analysis), and I’m trying to use a Gaussian Process (GP) model for interpolation. I have values spaced randomly in 2D, and I want to fit a GP model to predict values between them.

I have code to specify my model (based on the GP example here), and I’m able to fit the model using NUTS, but I receive the following error when I use the pymc.gp.Marginal.conditional function to specify the new points in 2D at which I want to predict values:

TypeError: Unsupported dtype for TensorType: object

(to keep the post short, the full error trace is in this attached text file:
error_traceback.txt (3.3 KB))

I found one other post on Discourse with this error message, but that seemed to be due to mixing Aesara and Theano code. As you can see in my reproducible example below, I am not doing this. Does anyone have an idea of what could be the issue? Thanks in advance for your help!

Package versions:

  • numpy: 1.24.2
  • pandas: 1.5.3
  • PyMC: 5.2.0
  • pytensor: 2.10.1
import numpy as np
import pandas as pd
import pymc as pm

from numpy.random import default_rng
from math import pi


'''
PREPARE DATA
'''

# In my case, data are loaded using `pd.read_csv`, but this chunk generates a similar pandas data frame.
n = 50
expol = np.random.uniform(-pi/2, 5*pi/2, size=n)
x_disk = np.random.uniform(size=n)
y_disk = np.random.uniform(size=n)
df = pd.DataFrame(data=np.column_stack((expol, x_disk, y_disk)),
                  index=np.random.choice(10000, n, replace=False),
                  columns=["expol", "x_disk", "y_disk"])

# Extract just the x and y values as an array
xy = df[["x_disk", "y_disk"]].values


'''
SET UP THE MODEL
'''

with pm.Model() as gp_fit:
    # Covariance parameters
    rho = pm.HalfCauchy('rho', 5)
    eta = pm.HalfCauchy('eta', 5)

    # Mean and convariance
    M = pm.gp.mean.Zero()
    K = (eta ** 2) * pm.gp.cov.ExpQuad(2, rho)

    # Measurement noise
    sigma = pm.HalfNormal('sigma', 50)

    # Instantiate the GP and provide data for evaluating likelihood
    recruit_gp = pm.gp.Marginal(mean_func=M, cov_func=K)
    recruit_gp.marginal_likelihood('recruits', X=xy, y=df['expol'], sigma=sigma)


'''
FIT THE MODEL
'''

with gp_fit:
    trace = pm.sample(target_accept=0.95, chains=4, return_inferencedata=True, cores=1)


'''
SAMPLE FROM THE POSTERIOR AT NEW LOCATIONS
'''

# Specify the points at which to generate predictions
x_new = np.linspace(np.min(df["x_disk"]), np.max(df["x_disk"]), 10)
y_new = np.linspace(np.min(df["y_disk"]), np.max(df["y_disk"]), 10)
xs, ys = np.asarray(np.meshgrid(x_new, y_new))
xy_new = np.asarray([xs.ravel(), ys.ravel()]).T

with gp_fit:
    # Specify the new locations (this is where the error is thrown)
    expol_pred = recruit_gp.conditional("expol_pred", Xnew=xy_new)

    # Sample from the posterior at the new locations
    gp_expol_samples = pm.sample_posterior_predictive(trace, vars=[expol_pred], samples=3, random_seed=42)

Hi Sebas!

This seems to be happening because of a bug in gp.conditional. It looks like it needs some extra checks internally to correctly convert pandas DataFrames that are given as data when you fit the model.

In this short term, what this means for you is that you can make the error go away by adding .values to your data in the marginal likelihood function:

    recruit_gp.marginal_likelihood('recruits', X=xy, y=df['expol'].values, sigma=sigma)

(The syntax for pm.sample_posterior_predictive has changed since that notebook was written as well, so you’ll still get some errors there – check the docs).

A best practice to avoid issues like this is to use pm.MutableData or pm.ConstantData to hold data in your models. These have other benefits to using these as well, but in this case it just helps to avoid unexpected errors from mixing tensors, ndarrays, and dataframes.

2 Likes

Hi Jesse,

Thanks for your quick response! Your solution of taking .values from the data frame that I was inputting to the marginal likelihood function solved this problem for me. After updating my call to pm.sample_posterior_predictive, I was able to generate predictions as I wanted.

I’ll look into pm.MutableData and pm.ConstantData to avoid future type errors and keep up with best practices. Thanks again!

1 Like