Using skopt's minimizing functions in find_MAP

The new Scikit-Optimize library (https://scikit-optimize.github.io) looks interesting for optimizing noisy black-box functions. It would be something PyMC3 could use for finding the maximum a posteriori, since it would be less prone to get stuck in local optima. For instance, gp_minimize does Bayesian optimization using Gaussian Processes. However it looks like its API does not match with scipy.optimize.fmin, so it probably doesn’t work out of the box with find_MAP's fmin parameter. Could we support these optimizing functions, maybe through a dedicated parameter?

That would be a cool idea, especially if there are ones work well with discrete RVs - would be a great alternative for ADVI init.

Actually, it should be possible, as you can pass extra args to fmin:

OK, it is a bit more complicated as the output from gp_minimize is a Object, below is a proof of concept:

def gp_minimize2(func, *args, **kwargs):
    def func2(x):
        return np.asscalar(func(x))
    return gp_minimize(func2, dimensions=[(-2.0, 2.0)], *args, **kwargs)['x']
                                         
with model:
    p = pm.Beta('p', 1., 1.)
    obs = pm.Binomial('obs', p=p, n=5, observed=2)
    start1 = pm.find_MAP()
    start2 = pm.find_MAP(fmin=gp_minimize2)

Thanks, I wanted to try but actually when the gradient is not available (which is my case) fmin is set to optimize.fmin_powel:
https://github.com/pymc-devs/pymc3/blob/master/pymc3/tuning/starting.py#L72-L78

Bypassing that, it seems to work but I’m not sure how the RVs are registered in the model (the dimensions we pass to gp_minimize need to refer to the right RV), is it in the order they are declared in the with Model() context manager?

In theory yes, but currently we are not using ordereddict so some times the output could be funky. You can check the order by doing:

model.bijection.ordering.vmap

which means that the output of:

model.bijection.map(model.test_point)

is the value in model.test_point parse according to the order in model.bijection.ordering.vmap.

FYI, I am using the following on a larger model:

def gp_minimize2(func, *args, **kwargs):
    def func2(x):
        return np.asscalar(func(x))
    return gp_minimize(func2, dimensions=[(-100.0, 100.0) for i in 
                                          range(len(model.bijection.map(model.test_point)))], 
        *args, **kwargs)['x']
                                         
with model:
    start1 = pm.find_MAP()
    start2 = pm.find_MAP(fmin=gp_minimize2)

It is running, however it is incredibly slow

Same for me, and the results are worse than with fmin_powel. I am using Python 3.6 so the dicts are ordered. Even on a basic test (without PyMC3) it is slow and the results are weird…

It might be better to try https://github.com/GPflow/GPflowOpt. https://github.com/fmfn/BayesianOptimization is also using GP in Scikit-learn for Bayesian Optimization.

Just curious, but what’s the cause of the slowness you guys are having? Is it just that evaluating logp is slow since your models are large?

I think the slowness is b/c of the GPopt function from Scikit-Optimize.