Can´t recover the mean of a GP output

Hello, I am using Gaussian process to model an output (Reward), in terms of inputs (states), according to Bellman equation , the reward of the current step, should satisfy this fromula

R_{T+1} = Q_{T} - γ Q_{T+1}.

my goal is to estimate the value functions, Q_{T}, Q_{T+1} as latent variables , so at the first time step where I have only one output, I need to estimate these two latent variables. I understand each of their values shall not be accurate just from one observation, but should not the difference of their posterior means ,with the discount factor γ , at least recover the output value from the first step, regardless of the noise ?

this is how I am coding the mean function

 def custum_nonparametric_mean_function(a_v_function_params, disc_factor):
  means_tensor=pt.as_tensor_variable([a_v_function_params[i] - disc_factor* a_v_function_params[i+1] for i in range(len(a_v_function_params) -1 )])
  return means_tensor

the a_v_function_params is a list of the parameters priors, which increase by one at each time step.

Could you post a minimum reproducible example that shows the error? It’s a bit hard to say from this information. The for loop in the mean function though could potentially be the issue.

1 Like

Thank you very much for your reply. I think the problem was the prior I was using for the value functions. Before I used a naive normal prior with zero mean, when I changed it to be the mean of the reward with some variance, the difference between the two posterior means seem to be more or less close to the reward now.

1 Like