I have been using Pymc3 for a few months now but I’d consider myself a novice. Similarly to what’s described in @NateAMthread I am struggling with putting together a model that allows me to infer the input parameters to GPR models.
In my case the 5 inputs and 90 outputs physics model was fitted with 90 GPR surrogates in Matlab. The methodology I am trying to implement is as follows:
Extract kernel function parameters from Matlab and load to Pymc3
In Pymc3 set up a kernel function and build a covariance matrix using parameters above
Set the priors on my 5 input parameters that I am trying to infer
Redefine the GP for each output
Define likelihood function and use the measured output data to infer/calibrate the inputs.
I tried a various different options and read through number of threads but just can’t get it right. Please see below an extract of my code. It is broken clunky and of poor quality but I want to get it working first and then optimise and tidy it up.
I’d really appreciate any help or guidance with this. I am keen reader and want to understand the details, so if appropriate, just point me to relevant documentation. I get various errors which imply that I am not even setting my model in correct way.
# sm_coeff is 2D np array with 90 sets of 5-off 'sigma_l' and 1-off sigma_f parameters to
# Exponentiated Quadratic kernel
len_measured = len(measured) # this is 90
no_inputs = 5 # hardcoded for convenience
with pm.Model():
# model covariance matrix with measurement error with fixed sigma
sigma = 0.2
var = sigma ** 2
true_cov = np.zeros([len_measured, len_measured])
np.fill_diagonal(true_cov, var)
# Define priors for the unknown parameters
priors = pm.Uniform('priors', 0.5, 2, shape=(1, no_inputs))
# Define the surrogate models for the outputs using the priors defined above
# Redefine the gaussian process
cov = np.asscalar(
sm_coeff[0, 5])*2.*pm.gp.cov.ExpQuad(input_dim=no_inputs, ls=sm_coeff[0, :-1])
gp = pm.gp.Latent(cov_func=cov, )
f_pred = gp.prior('f_pred_0', X=priors, shape=1)
for i, coeff in enumerate(sm_coeff[1:]):
cov = np.asscalar(
coeff[5])*2.*pm.gp.cov.ExpQuad(input_dim=no_inputs, ls=coeff[:-1])
gp = pm.gp.Latent(cov_func=cov, )
f_pred = tt.concatenate(
[f_pred, gp.prior('f_pred_{}'.format(i+1), X=priors, shape=1)])
# Define loglikelihood as Multivariate Normal with defined covariance matrix and observations
loglikelihood = pm.MvNormal(
'loglikelihood', mu=f_pred, cov=true_cov, observed=measured, shape=len_measured)
# Inference
# step = pm.Metropolis() using Metropolis sampling
step = pm.NUTS() # using NUTS sampling
# draw 3000 posterior samples
trace = pm.sample(3000, step=step, cores=1)
Welcome to the forum! Will you post the errors you are seeing? That may help people debug the model.
I’ve ended up wrapping the surrogate models in another library (GitHub - SMTorg/smt: Surrogate Modeling Toolbox) and that works pretty well. They provide gradients so NUTS works which is a big plus. The big benefit is that all the different surrogates use the same interface (GPs are included as a subset) so I can try out different approaches very quickly.
Thank you for your quick response. I am a bit constrained in what I can do and what I can install but I will definitely investigate the Surrogate Modelling Toolbox you recommended.
I kept working on the problem described above over the weekend but with not much success. If I extend prior uniform to ridiculous values like (i.e. -80000, 80000), I can get Pymc3 to sample but it is incredibly slow and it quotes ~12h to complete. Obviously something I’ve done is not right because Matlab does it in couple of minutes. There is a few things I am struggling with and they are possibly wrong in my implementation:
Wrong size/shape of tensors at the input to kernel function. I have 90 prefit GPs kernel parameters with separate length factor for each predictor. Matlab calls it “ARD Squared Exponential Kernel”. I assumed I deal with it through ExpQuad(input_dim=no_inputs, ls_inv=sm_coeff[:, :-1]). I’m not entirely sure what is the difference between ls and ls_inv.
I may have some problem with scaling when lifting kernel parameters from Malab onto Pymc3
There is something wrong with the model I put together.
Please see below updated code with the error message.
with pm.Model():
# covariance matrix with measurement error with fixed sigma for
sigma = 0.2
var = sigma ** 2
true_cov = np.zeros([len_measured, len_measured])
np.fill_diagonal(true_cov, var)
# Define priors for the unknown parameters
priors = pm.Uniform('priors', .1, 5,
shape=(1, no_inputs))
# Define the surrogate models for the outputs using the priors defined
# Redefine the gaussian process
sigma_f = theano.shared(sm_coeff[:, 5], 'sigma_f')
sigma_l = theano.shared(sm_coeff[:, :-1], 'sigma_l')
cov = sigma_f ** 2 *
pm.gp.cov.ExpQuad(input_dim=no_inputs, ls_inv=sigma_l)
gp = pm.gp.Latent(cov_func=cov, )
f_pred = gp.prior(
'f_pred', X=priors, shape=len_measured)
# Define loglikelihood as Multivariate Normal with cov and observations
loglikelihood = pm.MvNormal(
'loglikelihood', mu=f_pred, cov=true_cov, observed=measured,
shape=len_measured)
# Inference
# step = pm.Metropolis() using Metropolis sampling
step = pm.NUTS() # using NUTS sampling
# draw 3000 posterior samples
trace = pm.sample(3000, step=step, cores=1)
The error I currently get is:
Exception has occurred: LinAlgError
73-th leading minor of the array is not positive definite
Which can help if your covariance matrix is very badly conditioned which would seem to address the exception that you are seeing. You can try changing 1e-9 around a bit to see how that changes the results. A lot of libraries do this sort of thing beneath the hood but I don’t believe that pymc3 does this.
Another, and far more invasive, approach would be to abandon sampling from the GP in pymc3 and do the following:
Evaluate the Gaussian process at many input combinations (say 10x the number of samples).
Compute the principal component analysis (PCA) decomposition of the (n samples x n inputs) matrix using either sklearn or SVD in numpy. numpy would let you use NUTS which would be good. SVD would be something like u, s, vh = np.linalg.svd((evaluations - evaluations.mean(0)) / evaluations.std(0))
and the components can be evaluated via s = u.T.dot(e).dot(vh.T)
Drop the components below some threshold of variance. numpy.linalg.svd returns the values in maximum principal component order so you could plot the relative size of the values as matplotlib.pyplot.plot(s / s[0]) and then you would have a relative importance of the contribution. Usually there is a, “drop-off,” where the components become less and less important.
Now you can compute the components of your measured sample via the above transformation: s_measured = u.T.dot((measured - evaluations.mean(0))/evaluations.std(0)).dot(vh.T)
Sometimes this has worked for me and sometimes not. It’s a useful tool to have in your back pocket through. It does evaluate very quickly though which is nice.
I know this is an old thread, but do you have some code example about the wrapping process? Because I don’t understand how do you “export” the gradient to be used in NUTS.