Need help to make pymc find optimal parameters

Falco_Peregrinus · September 22, 2023, 8:39pm

I have a set of measurements in the pandas dataframe, The first two columns are two parameters, and depending on their values, I also have 4 features. Let’s say I perform a new observation, and I have obtained a new set of values for those 4 features. My goal is to find in that lookup table the best match of my observation to that lookup table dataset.
How am I planning to do that? I am “varying” those two parameters, for each combination of parameters I get a set of features, and then I calculate chi2 between those “new” features and the features from the dataset. So, instead of an exhaustive grid search of the minimum chi2, I am planning to search intelligently by using MCMC.
I managed to do it using the emcee library, but also using the pymc3. Although, pymc3 is at least 10 times slower. Therefore, I tried to implement the same thing using Pymc, but I am failing.
I need your help, I am not sure how to perform lookup in the table since the parameters must be distributions. I will paste my minimal reproducible example below, and I think it is self-explanatory. Basically, I need help to finish my code

import pandas as pd
import pytensor.tensor as pt
import pymc as pm
import numpy as np

    #this function should return features(data) for given parameters
def model_getFeaturesFromParams(Param1, Param2, lookup_table_arr):
    lookup_table_tensor = pt.as_tensor_variable(lookup_table_arr)
    # Compute the index based on the given Param1 and Param2
    print(lookup_table_tensor[:,0])
    a = pt.eq(lookup_table_tensor[:, 0], Param1)
    b = pt.eq(lookup_table_tensor[:, 1], Param1)
    print(a,b)
    index = (lookup_table_tensor[:, 0] == Param1) & (lookup_table_tensor[:, 1] == Param2)
    
    #  Check if there's a matching row in the lookup table
    if pt.all(index):
        # Extract the features from the matching row
        features = lookup_table_tensor[index][0][2:]
        return features
    else:
        # Handle the case when Param1 and Param2 are not found in the table
        return [0, 0, 0, 0]  # Modify this as needed

def chiSquaredLikelihood(observed_features, observed_errors, predicted_features):
        chi2 = pt.sum(pt.sqr((observed_features - predicted_features) / observed_errors))
        return -2 * chi2


def run_pymc():
    observed_features = np.array([2,2,4,4])
    observed_errors = np.array([0.03, 0.02, 0.02, 0.03])

    lookup_table = [
        [1,2, 1,1,2,2],
        [2,4, 2,2,4,4],
        [4,8, 4,4,8,8],
    ]
    columns = ["Param1", "Param2", "feature1", "feature2", "feature3", "feature4"]
    df = pd.DataFrame(lookup_table, columns=columns)
    lookup_table_arr = df.values

    with pm.Model() as model:
        Param1 = pm.Uniform('Param1', lower=1, upper=4)
        Param2 = pm.Uniform('Param2', lower=2, upper=8)
        Param1_tensor = pt.as_tensor_variable(Param1)
        Param2_tensor = pt.as_tensor_variable(Param2)

        # predicted_features = model_getFeaturesFromParams(Param1_tensor, Param2_tensor, df)
        predicted_features = model_getFeaturesFromParams(Param1_tensor, Param2_tensor, lookup_table_arr)

        likelihood = pm.DensityDist(
            "likelihood",  logp=lambda x: chiSquaredLikelihood(observed_features, observed_errors, x), 
                    observed=predicted_features.eval()
                    )
        trace = pm.sample(100,100, chains=4, return_inferencedata=True,
                                progressbar=True, random_seed=42, discard_tuned_samples=True, cores=4, step=pm.Metropolis())
    return trace

if __name__ == "__main__":
     run_pymc()

Topic		Replies	Views
Using a lookup table in pymc model version agnostic modeling	12	785	December 11, 2022
Feature discussion: Grid estimation with PyMC? Development	4	690	May 14, 2020
Predictions with PYMC, Dont work on test data set modeling	3	53	January 18, 2025
Help with pm.Potential Questions	10	1147	January 13, 2021
Beginning user, problem set correctly (experimental data fit)? v5 modeling	2	328	January 16, 2023

Need help to make pymc find optimal parameters

Related topics