Metamodeling strategy for an arbitrary generative model

Hello - I’m looking for on an efficient general-purpose metamodel representation that could be used for inference in pymc3. Maybe someone out there has some experience or suggestions. I have not seen the topic addressed here.

For example, in environmental engineering, we have a lot of physical and chemical generative models that have been developed over the decades that can’t readily be ported into pymc3, however, Bayesian inference could be useful for fitting and understanding the models. For context, think of a site-specific groundwater model built on USGS software - we might want to know what measured chemical concentrations say about some underlying variable, say the rate of biodegradation or something.

So the plan is to run the parent model a number of times with a variety of input parameters, make a metamodel representation of the input and output data in a format that pymc3 can handle, then perform inference on the input parameters from measured output parameters. I’d like some advice on the form the metamodel should take.

The metamodel would be a deterministic mapping between input and output variables. A linear model would be too simple to capture most parent models. I implemented a gaussian process metamodel without the error term, so it would be deterministic, but I found that it was really sensitive to the length parameter and I had trouble consistently getting faithful interpolations data points from the parent model. Maybe there is a better GP approach? Here is my code for reference:

import pymc3 as pm
import theano.tensor as tt
import theano.sandbox.linalg as tsl

def gp_calc(predictors,X,Y,X_new,ls = .1):
    '''
    GP calculator without noise
    predictors: list of predictors
    X: predictor values to train on
    Y: predicted values to train on
    X_new: new predictions
    ls: length scale
    mu: predicted Y_new values
    '''
    cov = pm.gp.cov.ExpQuad(input_dim = len(predictors), ls=ls)
    K = cov(X)
    K_inv = tsl.matrix_inverse(K)
    Ks = cov(X,X_new)
    mu = tt.dot(tt.dot(Ks.T, K_inv),Y)
    return mu

I’m not too concerned about the exact interpolation method between data points in the parent model, provided it is simple, explainable, and faithful to the model. A nearest neighbors interpolator would be fine for my purpose, but maybe there are too many discontinuities in the metamodel to get a good inference in pymc3? A simple linear interpolator would also be fine such as scipy.interpolate.LinearNDInterpolator but it also has edges that could make inference difficult. Since I would need to implement these in theano, I’d like to have an idea that they would work before coding it up.

If you have read this far, then thanks! Let me know if you have a suggested strategy.