Hi,

Thanks a lot for your answers already. I will try to give some more informations together with a minimal code example.

In the code below, I do the mass-balance model calibration just for one glacier given the observation (only one mean 20-year observation + 1 uncertainty estimate) and given prior estimates of the three free parameters (this comes from a pre-calibration where more observations are used but this runs only for a few glaciers). Instead of actually calling the mass-balance model (it means computing the mass-balance for each of the 20 years at different altitudinal points), I would like to use a GP emulator. The real mass-balance model is quite “nested” (opens up a lot of different methods, see e.g. massbalance-sandbox/mbmod_daily_oneflowline.py at 7c055cc96e45389f9d1643cf6b5c28bca643957c · OGGM/massbalance-sandbox · GitHub) … I don’t think it is feasible to write it into aesara code.

In the code below I just use a very simple interpolation to show how the format should look like:

```
import pymc3 as pm
# from external interpolation (using a simplification!!!)
slope_melt_f_test = -12.114951532711634
slope_pf_test = 2019.224324758106
# observational mean mass-balance data for the Aletsch glacier (2000-2020)
dmdtda = -1.2107 *1e3
err_dmdtda = 0.1315 *1e3
max_allowed_specificMB = -142197.2962739946 # glacier would then melt entirely in just one year -> does not make sense!
with pm.Model() as model:
# let's assume fake distributions of a precipitation and a melt factor
pf = pm.TruncatedNormal('pf', mu=1.8,
sigma=0.5,
lower=0.1, upper=10)
melt_f = pm.TruncatedNormal('melt_f', mu=160,
sigma=100,
lower=10, upper=1000)
# temperature bias
# temp_bias = pm.Normal('temp_bias', mu=0,
# sigma=4)
# --> I guess it makes most sense to replace the MB model by an emulator (from GP regression)
# it should have a similar format as aet_mbs (see below)
# Here is just a fake MB model ( it works only without temp_bias and is only a simplification of the MB model!)
# slope_melt_f_test and slope_pf_test comes from a simple linear external interpolation
aet_slope_melt_fs = pm.Data('aet_slope_melt_fs',
[slope_melt_f_test])
aet_slope_pfs = pm.Data('aet_slope_pfs',
[slope_pf_test])
aet_mbs = aet_slope_pfs * pf + aet_slope_melt_fs * melt_f
### end of MB model
mb_mod = pm.Deterministic('mb_mod', aet_mbs)
observed = pm.Data('observed', [dmdtda ])
sigma = pm.Data('sigma', [err_dmdtda ])
geodetic_massbal = pm.TruncatedNormal('geodetic_massbal',
mu=mb_mod,
sigma=sigma, # standard deviation
observed=observed,
lower=max_allowed_specificMB) # likelihood
# chain length should be longer, here just for presentation:
trace = pm.sample(5000, chains=3, tune=5000, target_accept=0.98,
compute_convergence_checks=True, cores=3,
return_inferencedata=True)
#burned_trace = trace.sel(draw=slice(1000, None))
ppc = pm.sample_posterior_predictive(trace, #cores=cores,
random_seed=42,
var_names=['geodetic_massbal',
'pf', 'melt_f',
'mb_mod',
],
keep_size=True)
```

And just as a side-information: of course we have more than just one observation for the Aletsch glacier. However, I would like to have a framework that works for all glaciers world-wide (n > 200.000) and that runs inside of the Open Global Glacier Model, https://docs.oggm.org/en/stable/. What we have for almost every glacier is only one “reliable” measurement. As I have to repeat the MB calibration (and before the calibration of the emulator) for every glacier and maybe even with different settings, it should run relatively fast to be able to do regional to global analyses.

Now I tried to create a GP emulator inside of “pure” PyMC3. I run my mass-balance model with a selection of parameter combinations of pf, melt_f, temp_bias (here inside pd.DataFrame `pd_spec_mb`

) and then use that to calibrate the GP:

```
# normalize data
X = pd_spec_mb[['temp_bias','prcp_fac','melt_f'] ].values
y = pd_spec_mb['spec_mb_mean'].values # modelled mean mass-balance over 20 years
X_mean = X.mean(axis=0)
X_std = X.std(axis=0)
X_norm = (X - X_mean) / X_std
y_mean = y.mean()
y_std = y.std()
y_norm = (y - y_mean) / y_std
# Split into training and test data
X_train_norm,X_test_norm,y_train_norm,y_test_norm = sklearn.model_selection.train_test_split(X_norm,y_norm, test_size=0.8)
y_test = y_test_norm*y_std + y_mean
X_test = X_test_norm*X_std + X_mean
with pm.Model() as model_reg:
# hyperprior for lengthscale kernel parameter
l = pm.Gamma('l', alpha=2, beta=0.5) #
# instantiate a covariance function
input_dim = 3
cov = pm.gp.cov.ExpQuad(input_dim, ls=l)
# instantiate a GP prior
gp = pm.gp.Marginal(cov_func=cov)
# prior
epsilon = pm.HalfNormal('epsilon', 0.001)
# likelihood
y_pred_norm = gp.marginal_likelihood('y_pred_norm', X=X_train_norm, y=y_train_norm, noise=epsilon, is_observed=True)
mp = pm.find_MAP()
y_test_emulator = gp.predict(X_test_norm, point=[mp], diag=True)[0]*y_std + y_mean
```

If I just want one estimate, would it make sense to use the MAP here? With the last line, I only compare how well the emulator has worked. Does it make sense to do the GP in that way together with the normalistion?

**But my main question is actually: how can I feed the results of the GP calibration into my actual mass-balance calibration from the code above?**. I would like to get it into a shape that I can use it instead of `aet_mbs`

. I tried to use `gp. predict`

or `gp.conditional`

but in both cases I was not able to get the GP model in the right shape to actually take as input the TransformedRV (pf, melt_f, temp_bias) distributions from the code block above. Do you know about any example that does this? I have tried to search the examples, but I could not really find something that is doing what I need to do.

Greetings,

Lily