Hello,

I modelled my 2-dimensional data (one being a continuous value and one being the category the datapoint stems from) with gaussian processes in 2 ways that should actually be equivalent but somehow I yield different prediction results (the posterior seems fine). The only difference is basically that one time I create a fit model from scratch and a predict model by using copy.deepcopy() and then adding the prediction variables while the other time I explicitely create two separate (though equal apart from the prediction part) models.

So this is what I did regarding the first approach:

```
coords={
'index_train': df_train.index,
'index_predict': df_predict.index,
'category': df_train['CATEGORY'].cat.categories,
'input_dim': ['ANGLE', 'CAT_ID'],
'output_dim': ['FORCE'],
}
def warp_periodic(x, period=360):
alpha = 2*np.pi/period*x[:,0]
return at.stack((at.sin(alpha), at.cos(alpha)), axis=1)
with pm.Model(coords=coords) as model_train:
# Data
x_train = pm.ConstantData('x_train', df_train[['ANGLE', 'CAT_ID']], dims=('index_train', 'input_dim'))
y_train = pm.ConstantData('y_train', df_train['FORCE'], dims='index_train')
# Hyperparameter Priors
sigma_mu = pm.ConstantData('sigma_mu', [20, 2, 0.2, 0.2, 0.2, 0.02], dims='category')
sigma_sigma = pm.ConstantData('sigma_sigma', [10, 1, 0.1, 0.1, 0.1, 0.01], dims='category')
sigma = pm.InverseGamma('sigma', mu=sigma_mu, sigma=sigma_sigma, dims='category')
eta_mu = pm.ConstantData('eta_mu', 25)
eta_sigma = pm.ConstantData('eta_sigma', 20)
eta = pm.InverseGamma('eta', mu=eta_mu, sigma=eta_sigma)
ls_mu = pm.ConstantData('ls_mu', 0.6)
ls_sigma = pm.ConstantData('ls_sigma', 0.3)
ls = pm.InverseGamma('ls', mu=ls_mu, sigma=ls_sigma)
lsn_mu = pm.ConstantData('lsn_mu', 0.15)
lsn_sigma = pm.ConstantData('lsn_sigma', 0.1)
lsn = pm.InverseGamma('lsn', mu=lsn_mu, sigma=lsn_sigma)
noise_mu = pm.ConstantData('noise_mu', 1e-1)
noise_sigma = pm.ConstantData('noise_sigma', 8e-2)
noise = pm.InverseGamma('noise', mu=noise_mu, sigma=noise_sigma) + 1e-4
# Mean & Covariance Functions
cov = (eta**2
* pm.gp.cov.WarpedInput(input_dim=2, cov_func=pm.gp.cov.ExpQuad(2, ls=ls), warp_func=warp_periodic, active_dims=[0])
+ pm.gp.cov.Coregion(input_dim=2, W=at.zeros_like(sigma), kappa=sigma, active_dims=[1])**2
* pm.gp.cov.WarpedInput(input_dim=2, cov_func=pm.gp.cov.Matern52(2, ls=lsn), warp_func=warp_periodic, active_dims=[0]))
mean = pm.gp.mean.Zero()
# Marginal GP Likelihood
gp = pm.gp.Marginal(mean_func=mean, cov_func=cov)
force_obs = gp.marginal_likelihood('force_obs', X=x_train, y=y_train, sigma=noise, dims='index_train')
with deepcopy(model_train) as model_predict:
# Conditional GP Prediction
x_predict = pm.ConstantData('x_predict', df_predict[['ANGLE', 'CAT_ID']], dims=('index_predict', 'input_dim'))
force_predict = gp.conditional('force_predict', Xnew=x_predict, pred_noise=True, dims='index_predict')
```

The other time I didnâ€™t use the deepcopy() but just created another model including the prediction part from scratch.

```
with pm.Model(coords=coords) as model_predict:
... # full model definition
```

The results were surprisingly quite different. Here are 2 plots (I confirmed that the plots are actually correct according to the prediction results) for the category where it was most visible:

First approach (using deepcopy):

Second approach (separate models from scratch):

It almost appears like the length scales of the Matern52 would be different. Does anyone have an idea what caused this behavior? Thanks in advance!