Equivalent GP models, different results

jan5 · October 16, 2023, 1:58pm

Hello,

I modelled my 2-dimensional data (one being a continuous value and one being the category the datapoint stems from) with gaussian processes in 2 ways that should actually be equivalent but somehow I yield different prediction results (the posterior seems fine). The only difference is basically that one time I create a fit model from scratch and a predict model by using copy.deepcopy() and then adding the prediction variables while the other time I explicitely create two separate (though equal apart from the prediction part) models.

So this is what I did regarding the first approach:

coords={
    'index_train': df_train.index,
    'index_predict': df_predict.index,
    'category': df_train['CATEGORY'].cat.categories,
    'input_dim': ['ANGLE', 'CAT_ID'],
    'output_dim': ['FORCE'],
}

def warp_periodic(x, period=360):
    alpha = 2*np.pi/period*x[:,0]
    return at.stack((at.sin(alpha), at.cos(alpha)), axis=1)

with pm.Model(coords=coords) as model_train:
    # Data
    x_train = pm.ConstantData('x_train', df_train[['ANGLE', 'CAT_ID']], dims=('index_train', 'input_dim'))
    y_train = pm.ConstantData('y_train', df_train['FORCE'], dims='index_train')

    # Hyperparameter Priors
    sigma_mu = pm.ConstantData('sigma_mu', [20, 2, 0.2, 0.2, 0.2, 0.02], dims='category')
    sigma_sigma = pm.ConstantData('sigma_sigma', [10, 1, 0.1, 0.1, 0.1, 0.01], dims='category')
    sigma = pm.InverseGamma('sigma', mu=sigma_mu, sigma=sigma_sigma, dims='category')

    eta_mu = pm.ConstantData('eta_mu', 25)
    eta_sigma = pm.ConstantData('eta_sigma', 20)
    eta = pm.InverseGamma('eta', mu=eta_mu, sigma=eta_sigma)

    ls_mu = pm.ConstantData('ls_mu', 0.6)
    ls_sigma = pm.ConstantData('ls_sigma', 0.3)
    ls = pm.InverseGamma('ls', mu=ls_mu, sigma=ls_sigma)

    lsn_mu = pm.ConstantData('lsn_mu', 0.15)
    lsn_sigma = pm.ConstantData('lsn_sigma', 0.1)
    lsn = pm.InverseGamma('lsn', mu=lsn_mu, sigma=lsn_sigma)
    
    noise_mu = pm.ConstantData('noise_mu', 1e-1)
    noise_sigma = pm.ConstantData('noise_sigma', 8e-2)
    noise = pm.InverseGamma('noise', mu=noise_mu, sigma=noise_sigma) + 1e-4
    
    # Mean & Covariance Functions
    cov = (eta**2
        * pm.gp.cov.WarpedInput(input_dim=2, cov_func=pm.gp.cov.ExpQuad(2, ls=ls), warp_func=warp_periodic, active_dims=[0])
        + pm.gp.cov.Coregion(input_dim=2, W=at.zeros_like(sigma), kappa=sigma, active_dims=[1])**2
        * pm.gp.cov.WarpedInput(input_dim=2, cov_func=pm.gp.cov.Matern52(2, ls=lsn), warp_func=warp_periodic, active_dims=[0]))
    mean = pm.gp.mean.Zero()

    # Marginal GP Likelihood
    gp = pm.gp.Marginal(mean_func=mean, cov_func=cov)
    force_obs = gp.marginal_likelihood('force_obs', X=x_train, y=y_train, sigma=noise, dims='index_train')

with deepcopy(model_train) as model_predict:
    # Conditional GP Prediction
    x_predict = pm.ConstantData('x_predict', df_predict[['ANGLE', 'CAT_ID']], dims=('index_predict', 'input_dim'))
    force_predict = gp.conditional('force_predict', Xnew=x_predict, pred_noise=True, dims='index_predict')

The other time I didn’t use the deepcopy() but just created another model including the prediction part from scratch.

with pm.Model(coords=coords) as model_predict:
   ... # full model definition

The results were surprisingly quite different. Here are 2 plots (I confirmed that the plots are actually correct according to the prediction results) for the category where it was most visible:
First approach (using deepcopy):

Second approach (separate models from scratch):

It almost appears like the length scales of the Matern52 would be different. Does anyone have an idea what caused this behavior? Thanks in advance!

bwengals · October 16, 2023, 6:58pm

Hmm, nothing obvious jumps out. It does look like both should be equivalent. When are you calling pm.sample?

Maybe one prediction includes the likelihood noise and the other doesn’t?

jan5 · October 17, 2023, 10:09am

In both cases I’m calling it just after calling sample_prior_predictive. I wrapped them both in a function:

idata = az.InferenceData()
with model_train:
     idata.extend(sample_prior_predictive(**kwargs_prior), join='right')
     idata.extend(sample(**kwargs_posterior), join='right')

Could it be that deepcopy is making problems by not copying the tensors properly?

ricardoV94 · October 17, 2023, 10:36am

Maybe, I wouldn’t be surprised. Cloning a PyMC model is not trivial and I don’t expect deepcopy to do a good job. This is specially relevant for GPs, as they hold references to old variables in the instances.

PyMC has a clone_model utility, but for similar reasons that won’t work properly with GP objects.
https://www.pymc.io/projects/docs/en/latest/api/model/generated/pymc.model.fgraph.clone_model.html

jan5 · October 17, 2023, 2:54pm

It seems that deepcopy is indeed the problem. This time I used only one model for both fitting and prediction and got the same results as in the approach with 2 separate models created from scratch.

Topic		Replies	Views
Question about "gp.predict": Gaussian Processes Questions	2	833	September 2, 2021
`conditional` and Deep Gaussian Processes v5 gaussian_process	6	958	April 17, 2023
Multi-output gaussian processes Questions	13	5454	October 22, 2017
Modeling longitudinal data among patients by coupling gaussian processes with random effects Questions	5	1669	September 2, 2021
Shape mismatch when running predictions with Gaussian Processes version agnostic gaussian_process	5	69	March 12, 2025

Equivalent GP models, different results

Related topics