Combine pymc3 with a scikit-learn object

Hi, I’m using pymc3 to optimize two parameters in my GaussianProcessRegressor which is a scikit-learn object. And I tried many methods but it still can’t run. So I came here for some suggestion.
Below is my code:

X = data_1[‘param’] #which is 88×400
Cp_E = data_1[‘Cp_E’] #1×400
Y = data_1[‘Cp_M’] #88×400
gaussian=GaussianProcessRegressor()
fiting=gaussian.fit(X, Y)

def main(argv=None):
  with pm.Model() as model_:
    Cdt1 = pm.Normal(‘Cdt1’, mu = 20., sd = 10.)
    CDESkeps = pm.Normal(‘CDESkeps’, mu = 0.6, sd = 3.)
    epsilon = pm.Uniform(‘epsilon’, lower = 0, upper = 1)
    Z = np.array([Cdt1, CDESkeps])
    YX=gaussian.predict(Z.reshape(1, -1))
    y_pred = pm.Normal(‘y_pred’, mu = YX, sd = epsilon, observed = Cp_E)
    start = pm.find_MAP()
    step = pm.NUTS(scaling = start)
    trace_ = pm.sample(5000, step = step, start = start)
  parameters = [‘Cdt1’, ‘CDESkeps’]
  pm.traceplot(trace_, parameters)

if name==‘main’:
  sys.exit(main())

The error is setting an array element with a sequence. But I have confirmed that both the shape of YX and Cp_E are (1,400).
Hope someone can give me some suggestion, thanks in advance.

I am confused by the code.
Cdt1, CDESkeps are the two parameters, but they are treated as input in YX=gaussian.predict(Z.reshape(1, -1)).

Can we see the actual error message?

I have the exact same problem. I have some X and Y values, on which I run a Gaussian regression (by scikit-learn) in order to be able to recall it within the Bayesian calibration for which I use PyMC3.

The error message I receive is the following:

File ~\anaconda3\envs\pymc3_venv\lib\site-packages\sklearn\gaussian_process_gpr.py:371 in predict
X = self._validate_data(X, ensure_2d=ensure_2d, dtype=dtype, reset=False)

File ~\anaconda3\envs\pymc3_venv\lib\site-packages\sklearn\base.py:566 in _validate_data
X = check_array(X, **check_params)

File ~\anaconda3\envs\pymc3_venv\lib\site-packages\sklearn\utils\validation.py:746 in check_array
array = np.asarray(array, order=order, dtype=dtype)

ValueError: setting an array element with a sequence.

Simple example on combining Scikit and PyMC3.py (2.6 KB)

The issue here seems to be that you are passing a tensor or theano/aesara random variable (teta) to sklearn which doesn’t know about such things. You are trying to cast the tensor into an array, but this won’t work because tensors are “empty” until runtime. If you are interested in using GPs “inside” of pymc models, I would suggest checking out the GP submodule of pymc itself and the associated notebooks.