I’ve been working through this for awhile but I think it’s time to get some help on this one. I have a very expensive function f
of N
variables (theta_n
) to evaluate. I don’t know the gradient of the function with respect to the variables. I would like to fit my variables to some test data that I have.
Since my function is so expensive I’ve taken to building a surrogate model of the output. I run the model with a set of parameters to get outputs. Then I run a PCA on the output to reduce my data. I map my parameters from their distributions to variables with are unit normally distributed using a copula (i.e. X_n = pm.Normal(mu=0, sd=1)
). I now have the amplitude outputs of my simulations from my PCA and the corresponding X_n
values.
I determine which components in my PCA are important to my test data and then form guassian processes which map from my normalized parameters to the relevant amplitudes of my PCA. I should note that I’m fitting my gaussian processes using GPy and then writing the covariance functions myself. I’ve done tests to make sure that I’m getting the same results. I’m doing this since I don’t need distributions on my guassian process parameters. I just need the best fit.
I can then form my pymc3 model as:
with pm.Model() as model:
X = pm.Normal('X', mu=0, sd=1, shape=(1, N))
y_1, variance_1 = GaussianProcess(X)
...
r1 = pm.Normal('r1', mu = y_1,\
sd = tt.sqrt(variance_1 + test_variance_1),
observed = test_amplitude_1)
...
I’ve found that this approach gives me a lot of divergences. One thought that I had was that my surrogate was bad. I can run the expensive model again on the median of X
and then redo my PCA and fit my gaussian processes again. Afterwards I can run my pymc3 model and I still get a lot of divergences. Re-running doesn’t seem to decrease them.
That being said, I can compare the evaluation of the expensive function on the median to the inverse PCA on the posterior_predictive_check values of y_1, y_2, ...
and I find that they seem to align pretty well so I’m tempted to say that despite the divergences I’m getting good results.
I’ve tried to solve the divergences by using a non-centered formulation:
with pm.Model() as model:
mu = pm.Normal('mu', mu=0, sd=1)
tau = pm.HalfNormal('tau', sd=1)
lvs = pm.Normal('lvs', mu=0, sd=1)
X = pm.Deterministic('X', mu + tau*lvs)
...
but haven’t seen any improvement.
Any thoughts about how to solve the divergences or whether my overall approach is flawed?