Combine pymc3 with a scikit-learn object

Vincy_1008 · May 25, 2021, 8:36am

Hi, I’m using pymc3 to optimize two parameters in my GaussianProcessRegressor which is a scikit-learn object. And I tried many methods but it still can’t run. So I came here for some suggestion.
Below is my code:

X = data_1[‘param’] #which is 88×400
Cp_E = data_1[‘Cp_E’] #1×400
Y = data_1[‘Cp_M’] #88×400
gaussian=GaussianProcessRegressor()
fiting=gaussian.fit(X, Y)

def main(argv=None):
  with pm.Model() as model_:
    Cdt1 = pm.Normal(‘Cdt1’, mu = 20., sd = 10.)
    CDESkeps = pm.Normal(‘CDESkeps’, mu = 0.6, sd = 3.)
    epsilon = pm.Uniform(‘epsilon’, lower = 0, upper = 1)
    Z = np.array([Cdt1, CDESkeps])
    YX=gaussian.predict(Z.reshape(1, -1))
    y_pred = pm.Normal(‘y_pred’, mu = YX, sd = epsilon, observed = Cp_E)
    start = pm.find_MAP()
    step = pm.NUTS(scaling = start)
    trace_ = pm.sample(5000, step = step, start = start)
  parameters = [‘Cdt1’, ‘CDESkeps’]
  pm.traceplot(trace_, parameters)

if name==‘main’:
sys.exit(main())

The error is setting an array element with a sequence. But I have confirmed that both the shape of YX and Cp_E are (1,400).
Hope someone can give me some suggestion, thanks in advance.

Freakwill · January 11, 2022, 3:14am

I am confused by the code.
Cdt1, CDESkeps are the two parameters, but they are treated as input in YX=gaussian.predict(Z.reshape(1, -1)).

cluhmann · January 11, 2022, 4:16am

Can we see the actual error message?

k.f · November 17, 2022, 9:54am

I have the exact same problem. I have some X and Y values, on which I run a Gaussian regression (by scikit-learn) in order to be able to recall it within the Bayesian calibration for which I use PyMC3.

The error message I receive is the following:

File ~\anaconda3\envs\pymc3_venv\lib\site-packages\sklearn\gaussian_process_gpr.py:371 in predict
X = self._validate_data(X, ensure_2d=ensure_2d, dtype=dtype, reset=False)

File ~\anaconda3\envs\pymc3_venv\lib\site-packages\sklearn\base.py:566 in _validate_data
X = check_array(X, **check_params)

File ~\anaconda3\envs\pymc3_venv\lib\site-packages\sklearn\utils\validation.py:746 in check_array
array = np.asarray(array, order=order, dtype=dtype)

ValueError: setting an array element with a sequence.

Simple example on combining Scikit and PyMC3.py (2.6 KB)

cluhmann · November 17, 2022, 7:27pm

The issue here seems to be that you are passing a tensor or theano/aesara random variable (teta) to sklearn which doesn’t know about such things. You are trying to cast the tensor into an array, but this won’t work because tensors are “empty” until runtime. If you are interested in using GPs “inside” of pymc models, I would suggest checking out the GP submodule of pymc itself and the associated notebooks.

Topic		Replies	Views
Pymc with external GP function version agnostic shape_issue	0	404	June 21, 2023
Modeling of multiple regression model with array form v5 development , linear_model , shape_issue , modeling	12	1234	December 17, 2024
find_MAP error: setting an array element with a sequence Questions	0	561	November 14, 2019
Gaussian Process Regression: PyMC3 much slower than Scikit-learn Questions	1	1488	March 2, 2021
Time-series regression - shape error / Input dimension mis-match Questions time_series	8	1152	August 6, 2022

Combine pymc3 with a scikit-learn object

Related topics