Finding posterior for calibration using saved Gaussian model in pymc3

https://stackoverflow.com/questions/47216025/finding-posterior-for-calibration-using-saved-gaussian-model-in-pymc3

I have 2 files for sensors data like:

  1. Simulated-data-file : (input:xc1,xc2,…,parameters:Q1,Q2,…,output:yc)
  2. Field Data File: (input:xf1,xf2,…,output:yf)

I have problem for simulation of Bayesian Calibration of sensor’s data.

Now what i want is to train/find values for Calibration parameter(Q) which when used with xf, will yield yf’ which should be ~yf

There are approaches which suggest using gaussian process being called for every run of MCMC sampling for calibration. They combine both Simulation as well as Field data for this purpose.

That solution is fully Bayesian approach as learning from previous step will be used for next run using gaussian process as simulator.

But now challenge for me is that i want Bayesian approach,but now i have pre-trained Gaussian Regressor model{n(xc,Q)} using (xc,Q) from simulated data, and now i am not merging simulated and field data.

Now i left with xf,yf and n(xc,Q){pretrained model}.

I know the prior for all Q i.e Uniform(0,1)

  • Now i stuck with no idea of how to use pretrained Gaussian model
    to find calibrated Q, which will be best to describe yf.
  • Currently i am looking for pymc3 for this approach , but not finding
    enough support in documents for this kind of approach.

Below are the sample field-data:

    yf,xf1,xf2,xf3
    1562,1.69166666666667,37.9301075268817,204.021505376344
    1395,0.0610119047619034,51.1919642857143,181.846726190476
    1279,4.35067204301076,50.6075268817204,219.407258064516

Link for trained Gaussian model from simulated-data: https://github.com/naitikshukla/BEEHub/tree/master/initial/finalized_model_gp.pkl

Currently trying below code in pymc3:

    if __name__ == '__main__':
    	#Load Model
    	gp = joblib.load('beehub/testdataset/finalized_model_gp.pkl')
    	#Load Data
    	xf,yf,n = load_obs_data('DATAFIELD.csv',3)	#UDF provide filename and number of xf
    
    	with pm.Model() as model:
    		yf_flt = yf.values.reshape(1,12)
    		Q1 = pm.Uniform('Q1',0,1, observed = yf_flt)		
    		Q2 = pm.Uniform('Q2',0,1, observed = yf_flt)
    		Q3 = pm.Uniform('Q3',0,1, observed = yf_flt)
    		lambda_e = pm.Gamma('Noise',10,0.03)
    
    		y = pm.MvNormal('likelihood', observed=yf_flt, mu={use gp here like gp.predict(xf,Q1,Q2,Q3)}?, cov=lambda_e*tt.eye(n), shape=(n))
    		
    		#running MCMC sampling
    		start = pm.find_MAP()
    		step = pm.NUTS()
    		trace_ = pm.sample(500, step,progressbar=True,discard_tuned_samples=False)
    		trace = trace_[500:]
    
    	#plot Noise and delta graph
    	pm.traceplot(trace,varnames=['Q1','Q2','Q3'])
    
    	# Draw samples from posterior
    	ppc1 = pm.sample_ppc(trace, samples=500, model=model)

[Note]: Script to train GP model also given in github link (train_gp.py), trained using numpy arrays in data.

!! Sorry for long post , but I am not sure what to ask exactly so tried to write as much i can understand. Please correct me if i am wrong anywhere in my assumption, but basic thing is simple to :
use pretrained model to find best Q parameter for field data yf. !!

see Use saved gaussian model from sci-kit in pymc3?