I have a set of measurements in the pandas dataframe, The first two columns are two parameters, and depending on their values, I also have 4 features. Let’s say I perform a new observation, and I have obtained a new set of values for those 4 features. My goal is to find in that lookup table the best match of my observation to that lookup table dataset.
How am I planning to do that? I am “varying” those two parameters, for each combination of parameters I get a set of features, and then I calculate chi2 between those “new” features and the features from the dataset. So, instead of an exhaustive grid search of the minimum chi2, I am planning to search intelligently by using MCMC.
I managed to do it using the emcee library, but also using the pymc3. Although, pymc3 is at least 10 times slower. Therefore, I tried to implement the same thing using Pymc, but I am failing.
I need your help, I am not sure how to perform lookup in the table since the parameters must be distributions. I will paste my minimal reproducible example below, and I think it is self-explanatory. Basically, I need help to finish my code
import pandas as pd import pytensor.tensor as pt import pymc as pm import numpy as np #this function should return features(data) for given parameters def model_getFeaturesFromParams(Param1, Param2, lookup_table_arr): lookup_table_tensor = pt.as_tensor_variable(lookup_table_arr) # Compute the index based on the given Param1 and Param2 print(lookup_table_tensor[:,0]) a = pt.eq(lookup_table_tensor[:, 0], Param1) b = pt.eq(lookup_table_tensor[:, 1], Param1) print(a,b) index = (lookup_table_tensor[:, 0] == Param1) & (lookup_table_tensor[:, 1] == Param2) # Check if there's a matching row in the lookup table if pt.all(index): # Extract the features from the matching row features = lookup_table_tensor[index][2:] return features else: # Handle the case when Param1 and Param2 are not found in the table return [0, 0, 0, 0] # Modify this as needed def chiSquaredLikelihood(observed_features, observed_errors, predicted_features): chi2 = pt.sum(pt.sqr((observed_features - predicted_features) / observed_errors)) return -2 * chi2 def run_pymc(): observed_features = np.array([2,2,4,4]) observed_errors = np.array([0.03, 0.02, 0.02, 0.03]) lookup_table = [ [1,2, 1,1,2,2], [2,4, 2,2,4,4], [4,8, 4,4,8,8], ] columns = ["Param1", "Param2", "feature1", "feature2", "feature3", "feature4"] df = pd.DataFrame(lookup_table, columns=columns) lookup_table_arr = df.values with pm.Model() as model: Param1 = pm.Uniform('Param1', lower=1, upper=4) Param2 = pm.Uniform('Param2', lower=2, upper=8) Param1_tensor = pt.as_tensor_variable(Param1) Param2_tensor = pt.as_tensor_variable(Param2) # predicted_features = model_getFeaturesFromParams(Param1_tensor, Param2_tensor, df) predicted_features = model_getFeaturesFromParams(Param1_tensor, Param2_tensor, lookup_table_arr) likelihood = pm.DensityDist( "likelihood", logp=lambda x: chiSquaredLikelihood(observed_features, observed_errors, x), observed=predicted_features.eval() ) trace = pm.sample(100,100, chains=4, return_inferencedata=True, progressbar=True, random_seed=42, discard_tuned_samples=True, cores=4, step=pm.Metropolis()) return trace if __name__ == "__main__": run_pymc()