I have a set of measurements in the pandas dataframe, The first two columns are two parameters, and depending on their values, I also have 4 features. Let’s say I perform a new observation, and I have obtained a new set of values for those 4 features. My goal is to find in that lookup table the best match of my observation to that lookup table dataset.
How am I planning to do that? I am “varying” those two parameters, for each combination of parameters I get a set of features, and then I calculate chi2 between those “new” features and the features from the dataset. So, instead of an exhaustive grid search of the minimum chi2, I am planning to search intelligently by using MCMC.
I managed to do it using the emcee library, but also using the pymc3. Although, pymc3 is at least 10 times slower. Therefore, I tried to implement the same thing using Pymc, but I am failing.
I need your help, I am not sure how to perform lookup in the table since the parameters must be distributions. I will paste my minimal reproducible example below, and I think it is self-explanatory. Basically, I need help to finish my code
import pandas as pd
import pytensor.tensor as pt
import pymc as pm
import numpy as np
#this function should return features(data) for given parameters
def model_getFeaturesFromParams(Param1, Param2, lookup_table_arr):
lookup_table_tensor = pt.as_tensor_variable(lookup_table_arr)
# Compute the index based on the given Param1 and Param2
print(lookup_table_tensor[:,0])
a = pt.eq(lookup_table_tensor[:, 0], Param1)
b = pt.eq(lookup_table_tensor[:, 1], Param1)
print(a,b)
index = (lookup_table_tensor[:, 0] == Param1) & (lookup_table_tensor[:, 1] == Param2)
# Check if there's a matching row in the lookup table
if pt.all(index):
# Extract the features from the matching row
features = lookup_table_tensor[index][0][2:]
return features
else:
# Handle the case when Param1 and Param2 are not found in the table
return [0, 0, 0, 0] # Modify this as needed
def chiSquaredLikelihood(observed_features, observed_errors, predicted_features):
chi2 = pt.sum(pt.sqr((observed_features - predicted_features) / observed_errors))
return -2 * chi2
def run_pymc():
observed_features = np.array([2,2,4,4])
observed_errors = np.array([0.03, 0.02, 0.02, 0.03])
lookup_table = [
[1,2, 1,1,2,2],
[2,4, 2,2,4,4],
[4,8, 4,4,8,8],
]
columns = ["Param1", "Param2", "feature1", "feature2", "feature3", "feature4"]
df = pd.DataFrame(lookup_table, columns=columns)
lookup_table_arr = df.values
with pm.Model() as model:
Param1 = pm.Uniform('Param1', lower=1, upper=4)
Param2 = pm.Uniform('Param2', lower=2, upper=8)
Param1_tensor = pt.as_tensor_variable(Param1)
Param2_tensor = pt.as_tensor_variable(Param2)
# predicted_features = model_getFeaturesFromParams(Param1_tensor, Param2_tensor, df)
predicted_features = model_getFeaturesFromParams(Param1_tensor, Param2_tensor, lookup_table_arr)
likelihood = pm.DensityDist(
"likelihood", logp=lambda x: chiSquaredLikelihood(observed_features, observed_errors, x),
observed=predicted_features.eval()
)
trace = pm.sample(100,100, chains=4, return_inferencedata=True,
progressbar=True, random_seed=42, discard_tuned_samples=True, cores=4, step=pm.Metropolis())
return trace
if __name__ == "__main__":
run_pymc()