Can´t recover the mean of a GP output

cola · December 21, 2024, 3:12pm

Hello, I am using Gaussian process to model an output (Reward), in terms of inputs (states), according to Bellman equation , the reward of the current step, should satisfy this fromula

R_{T+1} = Q_{T} - γ Q_{T+1}.

my goal is to estimate the value functions, Q_{T}, Q_{T+1} as latent variables , so at the first time step where I have only one output, I need to estimate these two latent variables. I understand each of their values shall not be accurate just from one observation, but should not the difference of their posterior means ,with the discount factor γ , at least recover the output value from the first step, regardless of the noise ?

this is how I am coding the mean function

 def custum_nonparametric_mean_function(a_v_function_params, disc_factor):
  means_tensor=pt.as_tensor_variable([a_v_function_params[i] - disc_factor* a_v_function_params[i+1] for i in range(len(a_v_function_params) -1 )])
  return means_tensor

the a_v_function_params is a list of the parameters priors, which increase by one at each time step.

bwengals · December 23, 2024, 11:13pm

Could you post a minimum reproducible example that shows the error? It’s a bit hard to say from this information. The for loop in the mean function though could potentially be the issue.

cola · December 28, 2024, 4:10pm

Thank you very much for your reply. I think the problem was the prior I was using for the value functions. Before I used a naive normal prior with zero mean, when I changed it to be the mean of the reward with some variance, the difference between the two posterior means seem to be more or less close to the reward now.

Topic		Replies	Views
Value error in chains for a gaussian process v5	3	578	July 15, 2022
Gaussian process regression: samples from mean model Questions	1	376	August 25, 2020
Posterior predictive checks with Gaussian Process	4	1279	May 26, 2022
Custom prior for gaussian process mean v5 modeling	1	197	October 25, 2023
Develop multi-output GP model with linear mean for each task and with learnable hyperparameters v5	6	863	April 17, 2023

Can´t recover the mean of a GP output

Related topics