As a separate approach, you may want to consider using a coregionalized GP, as shown here. I think this is essentially what you’re after with “add a random effect GP”. This creates a GP with a vector output, one element for each class, where the outputs are correlated with one another and the degree of correlation is learned during sampling/optimization. The downside of this approach is it can be a bit overconfident/overfit, but otherwise it performs well.
4 Likes