Use Exact Gaussian Process model from GPyTorch as emulator in PyMC3

I have an external model (mass-balance model of a world-wide glacier model) that I would like to calibrate with PyMC3. It has three free parameters. It is too complicated to convert it into aesara in order that it can be calibrated by using NUTs (at least not with my knowledge of aesara). So, I was thinking about creating an “emulator” (via a Gaussian Process Regression) that represents the MB model sufficiently good in order that I can use the emulator to actually run the calibration of that model. For that emulator, I really want to have something fast and simple (e.g. sth. like from GPyTorch). So I have thought about two options:

  • I have tried to find a solution to import the GP-model of GPyTorch inside of PyMC3 (e.g. looking at e.g. ). However, I was not able to let it work for my use case. I have to admit that I really don’t understand how aesara works… Is there a simple way to do this? Could I use sth. as explained here: Use saved gaussian model from sci-kit in pymc3? ?

  • Another option is to directly use PyMC3 to create the emulator using a Gaussian Process. So far, it took me too long to calibrate the emulator using the MCMC sampling. I guess I did not use the right prior assumptions, but when I run with a smaller chain length or less training values, the emulator(GP model) is not exact enough. For the emulator, I really don’t need a Bayesian calibration, but just one perfect parameter combination. Is there any way to “reproduce” the simple ExactGaussianProcess model from GPyTorch directly inside of PyMC3/aesara in a computationally inexpensive way?

Thanks a lot in advance!

1 Like

For the emulator, I really don’t need a Bayesian calibration, but just one perfect parameter combination.

Would it be possible to use the GP implementations in pymc, but used fixed hyperparameter values (like lengthscale), taken from your PyTorch model? That would help computationally.

I’m not familiar with glacier mass balance models, but if you did change your mind and want to pursue getting it into Aesara, there are lots of folks here who could help! I know people have embedded differential equation solvers into PyMC models.


May I ask how you implemented the GP in PyMC? I think the best way to go in this case could be to use the marginal likelihood implementation. That ought to make PyMC’s sampling a little bit easier by integrating out the GP and reducing the number of parameters that have to be sampled. To recreate what GPyTorch is doing, you should be able to use the MAP estimate. Out of interest: how many data points you are training on, and also how fast should the model be?


Thanks a lot for your answers already. I will try to give some more informations together with a minimal code example.
In the code below, I do the mass-balance model calibration just for one glacier given the observation (only one mean 20-year observation + 1 uncertainty estimate) and given prior estimates of the three free parameters (this comes from a pre-calibration where more observations are used but this runs only for a few glaciers). Instead of actually calling the mass-balance model (it means computing the mass-balance for each of the 20 years at different altitudinal points), I would like to use a GP emulator. The real mass-balance model is quite “nested” (opens up a lot of different methods, see e.g. massbalance-sandbox/ at 7c055cc96e45389f9d1643cf6b5c28bca643957c · OGGM/massbalance-sandbox · GitHub) … I don’t think it is feasible to write it into aesara code.

In the code below I just use a very simple interpolation to show how the format should look like:

import pymc3 as pm

# from external interpolation (using a simplification!!!)
slope_melt_f_test = -12.114951532711634
slope_pf_test = 2019.224324758106
# observational mean mass-balance data for the Aletsch glacier (2000-2020)
dmdtda =  -1.2107 *1e3 
err_dmdtda = 0.1315 *1e3
max_allowed_specificMB = -142197.2962739946 # glacier would then melt entirely in just one year -> does not make sense!

with pm.Model() as model:
    # let's assume fake distributions of a precipitation and a melt factor
    pf = pm.TruncatedNormal('pf', mu=1.8,
                                lower=0.1, upper=10)
    melt_f = pm.TruncatedNormal('melt_f', mu=160,
                                lower=10, upper=1000)
    # temperature bias
    # temp_bias = pm.Normal('temp_bias', mu=0,
    #                                      sigma=4)
    # --> I guess it makes most sense to replace the MB model by an emulator (from GP regression)
    # it should have a similar format as aet_mbs (see below)
    # Here is just a fake MB model ( it works only without temp_bias and is only a simplification of the MB model!) 
    # slope_melt_f_test and slope_pf_test comes from a simple linear external interpolation 
    aet_slope_melt_fs = pm.Data('aet_slope_melt_fs',
    aet_slope_pfs = pm.Data('aet_slope_pfs',
    aet_mbs = aet_slope_pfs * pf + aet_slope_melt_fs * melt_f
    ### end of MB model 
    mb_mod = pm.Deterministic('mb_mod', aet_mbs)

    observed = pm.Data('observed', [dmdtda ])
    sigma = pm.Data('sigma', [err_dmdtda ])
    geodetic_massbal = pm.TruncatedNormal('geodetic_massbal',
                                          sigma=sigma,  # standard deviation
                                          lower=max_allowed_specificMB)  # likelihood
    # chain length should be longer, here just for presentation:
    trace = pm.sample(5000, chains=3, tune=5000, target_accept=0.98,
                              compute_convergence_checks=True, cores=3,
    #burned_trace = trace.sel(draw=slice(1000, None))
    ppc = pm.sample_posterior_predictive(trace, #cores=cores,
                                                            'pf', 'melt_f',

And just as a side-information: of course we have more than just one observation for the Aletsch glacier. However, I would like to have a framework that works for all glaciers world-wide (n > 200.000) and that runs inside of the Open Global Glacier Model, What we have for almost every glacier is only one “reliable” measurement. As I have to repeat the MB calibration (and before the calibration of the emulator) for every glacier and maybe even with different settings, it should run relatively fast to be able to do regional to global analyses.

Now I tried to create a GP emulator inside of “pure” PyMC3. I run my mass-balance model with a selection of parameter combinations of pf, melt_f, temp_bias (here inside pd.DataFrame pd_spec_mb) and then use that to calibrate the GP:

# normalize data 
X = pd_spec_mb[['temp_bias','prcp_fac','melt_f'] ].values  
y = pd_spec_mb['spec_mb_mean'].values  # modelled mean mass-balance over 20 years

X_mean = X.mean(axis=0)
X_std = X.std(axis=0)
X_norm = (X - X_mean) / X_std

y_mean = y.mean()
y_std = y.std()

y_norm = (y - y_mean) / y_std

# Split into training and test data
X_train_norm,X_test_norm,y_train_norm,y_test_norm = sklearn.model_selection.train_test_split(X_norm,y_norm, test_size=0.8)
y_test = y_test_norm*y_std + y_mean
X_test = X_test_norm*X_std + X_mean

with pm.Model() as model_reg:
    # hyperprior for lengthscale kernel parameter
    l = pm.Gamma('l', alpha=2, beta=0.5) # 
    # instantiate a covariance function 
    input_dim = 3
    cov =, ls=l)

    # instantiate a GP prior 
    gp =
    # prior
    epsilon = pm.HalfNormal('epsilon', 0.001)
    # likelihood 
    y_pred_norm = gp.marginal_likelihood('y_pred_norm', X=X_train_norm, y=y_train_norm, noise=epsilon, is_observed=True)
    mp = pm.find_MAP()
    y_test_emulator = gp.predict(X_test_norm, point=[mp],  diag=True)[0]*y_std + y_mean

If I just want one estimate, would it make sense to use the MAP here? With the last line, I only compare how well the emulator has worked. Does it make sense to do the GP in that way together with the normalistion?

But my main question is actually: how can I feed the results of the GP calibration into my actual mass-balance calibration from the code above?. I would like to get it into a shape that I can use it instead of aet_mbs . I tried to use gp. predict or gp.conditional but in both cases I was not able to get the GP model in the right shape to actually take as input the TransformedRV (pf, melt_f, temp_bias) distributions from the code block above. Do you know about any example that does this? I have tried to search the examples, but I could not really find something that is doing what I need to do.



1 Like

Hi Lily,

I think what you’re trying to do with your nested model is similar to what I was working with in this post and this Colab notebook. In there, I was using the (Latent) GP to model just one parameter in a larger model - I think that’s what you’re trying to do? A key point is you don’t actually use any predict method within the model, you just use the GP itself as the prior on a parameter, and then everything gets estimated simultaneously based on your observations: the GP, the parameter of interest, and all the remaining parameters in your model.

I’m not as familiar with GPs as emulators, though, so I may be misinterpreting your intended workflow.