Multidimensional inference and GP surrogate problem

Hello,

I have been using Pymc3 for a few months now but I’d consider myself a novice. Similarly to what’s described in @NateAM thread I am struggling with putting together a model that allows me to infer the input parameters to GPR models.

In my case the 5 inputs and 90 outputs physics model was fitted with 90 GPR surrogates in Matlab. The methodology I am trying to implement is as follows:

  • Extract kernel function parameters from Matlab and load to Pymc3
  • In Pymc3 set up a kernel function and build a covariance matrix using parameters above
  • Set the priors on my 5 input parameters that I am trying to infer
  • Redefine the GP for each output
  • Define likelihood function and use the measured output data to infer/calibrate the inputs.

I tried a various different options and read through number of threads but just can’t get it right. Please see below an extract of my code. It is broken clunky and of poor quality but I want to get it working first and then optimise and tidy it up.

I’d really appreciate any help or guidance with this. I am keen reader and want to understand the details, so if appropriate, just point me to relevant documentation. I get various errors which imply that I am not even setting my model in correct way.

    # sm_coeff is 2D np array with 90 sets of 5-off 'sigma_l' and 1-off sigma_f parameters to  
    # Exponentiated Quadratic kernel
    
    len_measured = len(measured) # this is 90
    no_inputs = 5 # hardcoded for convenience 

    with pm.Model():
        # model covariance matrix with measurement error with fixed sigma
        sigma = 0.2
        var = sigma ** 2
        true_cov = np.zeros([len_measured, len_measured])
        np.fill_diagonal(true_cov, var)

        # Define priors for the unknown parameters
        priors = pm.Uniform('priors', 0.5, 2, shape=(1, no_inputs))

        # Define the surrogate models for the outputs using the priors defined above
        # Redefine the gaussian process
        cov = np.asscalar(
            sm_coeff[0, 5])*2.*pm.gp.cov.ExpQuad(input_dim=no_inputs, ls=sm_coeff[0, :-1])
        gp = pm.gp.Latent(cov_func=cov, )
        f_pred = gp.prior('f_pred_0', X=priors, shape=1)

        for i, coeff in enumerate(sm_coeff[1:]):
            cov = np.asscalar(
                coeff[5])*2.*pm.gp.cov.ExpQuad(input_dim=no_inputs, ls=coeff[:-1])
            gp = pm.gp.Latent(cov_func=cov, )
            f_pred = tt.concatenate(
                [f_pred, gp.prior('f_pred_{}'.format(i+1), X=priors, shape=1)])

        # Define loglikelihood as Multivariate Normal with defined covariance matrix and observations
        loglikelihood = pm.MvNormal(
            'loglikelihood', mu=f_pred, cov=true_cov, observed=measured, shape=len_measured)

        # Inference
        # step = pm.Metropolis() using Metropolis sampling
        step = pm.NUTS()  # using NUTS sampling
        # draw 3000 posterior samples
        trace = pm.sample(3000, step=step, cores=1)

Best regards,
Pawel

Hi @Chmielar,

Welcome to the forum! Will you post the errors you are seeing? That may help people debug the model.

I’ve ended up wrapping the surrogate models in another library (GitHub - SMTorg/smt: Surrogate Modeling Toolbox) and that works pretty well. They provide gradients so NUTS works which is a big plus. The big benefit is that all the different surrogates use the same interface (GPs are included as a subset) so I can try out different approaches very quickly.

Nathan

Hi Nathan,

Thank you for your quick response. I am a bit constrained in what I can do and what I can install but I will definitely investigate the Surrogate Modelling Toolbox you recommended.

I kept working on the problem described above over the weekend but with not much success. If I extend prior uniform to ridiculous values like (i.e. -80000, 80000), I can get Pymc3 to sample but it is incredibly slow and it quotes ~12h to complete. Obviously something I’ve done is not right because Matlab does it in couple of minutes. There is a few things I am struggling with and they are possibly wrong in my implementation:

  • Wrong size/shape of tensors at the input to kernel function. I have 90 prefit GPs kernel parameters with separate length factor for each predictor. Matlab calls it “ARD Squared Exponential Kernel”. I assumed I deal with it through ExpQuad(input_dim=no_inputs, ls_inv=sm_coeff[:, :-1]). I’m not entirely sure what is the difference between ls and ls_inv.
  • I may have some problem with scaling when lifting kernel parameters from Malab onto Pymc3
  • There is something wrong with the model I put together.

Please see below updated code with the error message.

with pm.Model():
    # covariance matrix with measurement error with fixed sigma for
    sigma = 0.2
    var = sigma ** 2
    true_cov = np.zeros([len_measured, len_measured])
    np.fill_diagonal(true_cov, var)

    # Define priors for the unknown parameters
    priors = pm.Uniform('priors', .1, 5,
                        shape=(1, no_inputs))

    # Define the surrogate models for the outputs using the priors defined
    # Redefine the gaussian process
    sigma_f = theano.shared(sm_coeff[:, 5], 'sigma_f')
    sigma_l = theano.shared(sm_coeff[:, :-1], 'sigma_l')
    cov = sigma_f ** 2 * 
        pm.gp.cov.ExpQuad(input_dim=no_inputs, ls_inv=sigma_l)
    gp = pm.gp.Latent(cov_func=cov, )
    f_pred = gp.prior(
        'f_pred', X=priors, shape=len_measured)

    # Define loglikelihood as Multivariate Normal with cov and observations
    loglikelihood = pm.MvNormal(
        'loglikelihood', mu=f_pred, cov=true_cov, observed=measured,
         shape=len_measured)

    # Inference
    # step = pm.Metropolis() using Metropolis sampling
    step = pm.NUTS()  # using NUTS sampling
    # draw 3000 posterior samples
    trace = pm.sample(3000, step=step, cores=1)

The error I currently get is:

Exception has occurred: LinAlgError
73-th leading minor of the array is not positive definite

Any help will be much appreciated.

Regards,
Pawel

Hi @Chmielar,

I’m in a similar situation where I can’t really install anything but since SMT is available via conda I can at least get that on my computer.

One thing that might help is doing something like:

true_cov = np.zeros([len_measured, len_measured])
np.fill_diagonal(true_cov, var)
true_cov += 1e-9 * np.eye(len_measured)

Which can help if your covariance matrix is very badly conditioned which would seem to address the exception that you are seeing. You can try changing 1e-9 around a bit to see how that changes the results. A lot of libraries do this sort of thing beneath the hood but I don’t believe that pymc3 does this.

Another, and far more invasive, approach would be to abandon sampling from the GP in pymc3 and do the following:

  1. Evaluate the Gaussian process at many input combinations (say 10x the number of samples).

  2. Compute the principal component analysis (PCA) decomposition of the (n samples x n inputs) matrix using either sklearn or SVD in numpy. numpy would let you use NUTS which would be good. SVD would be something like
    u, s, vh = np.linalg.svd((evaluations - evaluations.mean(0)) / evaluations.std(0))
    and the components can be evaluated via
    s = u.T.dot(e).dot(vh.T)

  3. Drop the components below some threshold of variance. numpy.linalg.svd returns the values in maximum principal component order so you could plot the relative size of the values as
    matplotlib.pyplot.plot(s / s[0]) and then you would have a relative importance of the contribution. Usually there is a, “drop-off,” where the components become less and less important.

  4. Now you can compute the components of your measured sample via the above transformation:
    s_measured = u.T.dot((measured - evaluations.mean(0))/evaluations.std(0)).dot(vh.T)

So now your pymc3 code could become:

with pymc3.Model() as model:

    priors = pymc3.Uniform('priors', .1, 5, shape=(no_inputs))
    noise = pymc3.HalfNormal('noise', sd=1, shape=(n_components))

    c_pred = u.T.dot((priors - evaluations.mean(0)) / evaluations.std(0)).dot(vh.T)[:n_components]

    logLikelihood = pymc3.Normal('logLikelihood', mu=c_pred, sd=noise, measured=s_measured)

Sometimes this has worked for me and sometimes not. It’s a useful tool to have in your back pocket through. It does evaluate very quickly though which is nice.

1 Like

Hi Nathan,

I know this is an old thread, but do you have some code example about the wrapping process? Because I don’t understand how do you “export” the gradient to be used in NUTS.