Use saved gaussian model from sci-kit in pymc3?

I am very new to pymc3 and python itself so please bear with me if I am writing something wrong.

I have saved Gaussian regression model(gp) from sci-kit(with 6 input), saved using pickle. Now i have a file which contains some inputs(x{3 input}) and output(y) for gp.

Now question is for gp how i can calibrate remaining input(6 - 3 ), so that combining with remaining input(3) it yield approximate same output(y) or with minimal error, which is known. Currently I am thinking something like this:

gp = joblib.load('finalized_model_gp.pkl')				#Load saved gp model from scikit
x3,y1,num_records = load_file_data('DATAFIELD.csv',3)	#will return 3 input(x3 = x1,x2,x3) for gp model and 1 output variable(y)

def predict_y(x1,x2,x3,Q1,Q2,Q3):
	return gp.predict(x1,x2,x3,Q1,Q2,Q3)

with pm.Model() as model:
	Q1 = pm.Uniform('Q1',0,1)		#input 4, remaining 3 inputs that need to be calibrated for gp, with some known prior
	Q2 = pm.Uniform('Q2',0,1)		#input 5
	Q3 = pm.Uniform('Q3',0,1)		#input 6
	
	#Might be it should use gp in here to calibrate Q1,Q2,Q3 with x1,x2,x3 for y1 using predict_y()
	y = pm.MvNormal('likelihood', observed=y1 , mu=mu, cov=tt.eye(num_records), shape=(num_records))
    trace_ = pm.sample(500, step,progressbar=True,discard_tuned_samples=False) #some sampling.

I am not sure if this can be done using pymc3 or should i look into some other approach?

Hi @naitikshukla,
(This is the same as Finding posterior for calibration using saved Gaussian model in pymc3 right? sorry about the non-response).

What you want to do can surely be done in PyMC3. However, I would suggest you to build the GP in PyMC3 and calibrated there instead, so you can perform the inference in a coherent framework.

Otherwise, if you instead still want to use the fitted GP from scikit-learn, what you can do is isolate the parameters from the fitted GP, namely the mean and standard deviation (or standard error) of Q1, Q2, Q3. And instead of using a Uniform, use a Normal distribution to define it in the pm.Model:

with pm.Model() as model:
    Q1 = pm.Normal('Q1', mu_q1, sd_q1)
    Q2 = pm.Normal('Q2', mu_q2, sd_q2)
    Q3 = pm.Normal('Q3', mu_q3, sd_q3)
    gp = ... # use Q1, Q2, Q3 to build a GP in pymc3, and calibrate it using the new observation
             # more details in http://docs.pymc.io/notebooks/GP-Marginal.html

I guess Q1, Q2, Q3 is the parameters of the kernel function, so similar to

with pm.Model() as gp:
    ℓ = pm.Gamma("ℓ", alpha=2, beta=1)
    η = pm.HalfCauchy("η", beta=5)
    cov = η**2 * pm.gp.cov.Matern52(1, ℓ)
    gp = pm.gp.Marginal(cov_func=cov)

    σ = pm.HalfCauchy("σ", beta=5)
    y_ = gp.marginal_likelihood("y", X=X, y=y, noise=σ)

in your case it will go like this:

with model: # the model you define above with the Q1, Q2, Q3
    cov = Q1**2 * pm.gp.cov.Matern52(1, Q2)
    gp = pm.gp.Marginal(cov_func=cov)
    y_ = gp.marginal_likelihood("y", X=X, y=y, noise=Q3)

of course you need to make sure the Q1, Q2, Q3 in this case correspondent to the right input parameter for pm.gp.

Let me know if there is anything unclear :slight_smile:

1 Like

Thankyou for writing in detail, But I am still confused where I am using my pretrained model from scikit in above approach you shared.

I can have mean and sd for Q1,Q2 and Q3 , but do i have to make GP process again in pymc3 for sampling, or I have to use only pymc3 GP only for this.
Thanks in Advance

You will have to make the GP process again in pymc3, but you can set the parameters using the fitted parameters from scikit-learn (or build the prior accordingly using these fitted parameters). That’s what I meant

with model: # the model you define above with the Q1, Q2, Q3
    cov = Q1**2 * pm.gp.cov.Matern52(1, Q2)
    gp = pm.gp.Marginal(cov_func=cov)
    y_ = gp.marginal_likelihood("y", X=X, y=y, noise=Q3)

Q1, Q2, Q3 are the parameters from the pretrained scikit-learn GP.