How to deal with known inputs while training, but latent while inference?

cola · October 24, 2024, 8:21pm

Hello,

Let´s say I have a dataset where I can observe 10 inputs when creating it, but when I will use the model for prediction on real cases, I will just have 2 of these inputs, the other 8 their distributions are known for the whole dataset but can not be certain for each sample.

I am wondering what would be an appropriate approach to increase the posterior predictive variance to account for this uncertainty. One thought is that I can follow a frequentest approach, using all the 10 inputs to train the model, then when using it for prediction, for each single prediction, the 2 known inputs will be fixed, while the other 8 will be sampled from a Monte Carlo simulation and then output the distribution of the prediction. But I would rather approaching this problem from the Bayesian paradigm, so here are my thoughts.

First I can not include the uncertain inputs in this way, following my previous question, because this mean I will have a parameter for each data sample. I want to make these uncertain inputs shared between all the samples, just like the covariance function parameters.

Another approach would be to include them in the mean function as coefficients, and using their known distributions as priors, but to my understanding, for example if I am using a linear function, I will need to find a way to make them coefficients of the known 2 inputs, which may not be optimal, neither represent the fact that they are independent variables just like the other 2 inputs, so they should not depend on these 2 inputs.

If include all the 10 inputs in a regular way, I get good prediction, and the true value is inside the 95 % credible intervals, which is my goal, however I think this is biased, because in reality we will not have all the inputs available for predictions. for example in this image I am including some of the important unknown inputs in the training.

while here I am just training on the 2 known inputs, my problem is that I want to find a way to increase the uncertainty in the prediction so I more or less make sure that most of the time, the actual value lives in the credible interval, and I also want to avoid going to the frequentest approach.

So, in short, is there a way I can have the 8 unknown inputs (they have 8 known distributions) as shared parameters between all the data points while treating them as independent variables, just like the 2 known inputs?

jessegrabowski · October 25, 2024, 2:36am

The discussion in this thread might be relevant

Topic		Replies	Views
Reasoning about modelling uncertainty version agnostic modeling	0	308	September 28, 2022
Marginal likelihood implementation and measurement uncertainty Questions	12	841	November 2, 2019
Building a normal likelihood with known but varying variance for every data point/sample Questions	1	399	February 20, 2021
How does prior predictive work when parameters depend on 'input data', x?	2	218	January 16, 2024
Dealing with multivariate x, errors in both x and y, and posterior predictive for y Questions	9	3123	March 14, 2018

How to deal with known inputs while training, but latent while inference?

Related topics