Marginal student t process and misleading parameters


#1

I am interested in using TP but it currently does not support observations (well, technically it might using kwargs, but conditional would not work correctly). I’ve been reading the Shah paper and I don’t see any problems mathematically in adding this, so it seems straightforward to add a MarginalTP. Any interest in this idea?

While investigating this I’ve come across a couple of more general questions:

  • This one is probably for @bwengals, and forgive me if I’m being dense here. What is the reason for splitting processes into Latent and Marginal versions? Why not just have GP and TP classes with a prior method that can take y as an argument to determine whether or not it is observed? Then conditional would check for y and use it if it exists. This would save a fair amount of code, especially if MarginalTP gets implemented. I understand that this may be too big of a change, but I’m just curious!
  • This one is in regards to the TP and MvStudentT arguments. It seems like cov_func and cov are misleading parameter names, since they aren’t actually the covariances of the distributions. Rather, they refer to the Sigma parameter of MvStudentT, which is related to the covariance by nu * Sigma / (nu-2). Thus, here, for example, it looks like apples and oranges are being compared when using the same cov_func in a GP and TP. Can these arguments be changed so that they really are covariances, or at least be better documented so that people know that they aren’t actually what they seem to be?

#2
  • The reason for using Latent was a bit controversial at the time, you can find the conversation here https://github.com/pymc-devs/pymc3/pull/2444 where @bwengals also gave some detail response. I personally find his blog post on this excellent.

  • As for the second point, you are right: in the definition of MvStudentT, Sigma and Cov of MvStudentT is not the same thing, but I think Sigma was mistakenly called as Cov previously. We really should be more precise in the docstring about this - mind raising an issue and/or PR?


#3

Yes, it looks possible to add a MarginalTP implementation, thanks for pointing this out! I should have read the paper more closely when implementing. I just assumed the TP prior + Gaussian noise was not-conjugate and didn’t consider it further. To accomplish MarginalTP, they add white noise to the diagonal of K. The only (maybe) tricky part there is implementing conditional to handle both the with noise and without noise cases.

I also definitely agree with you guys about the bad cov & Sigma naming in TP. Thanks for catching that @jordan-melendez, I agree with what you suggest on changes for those.

I’m sure you noticed but the links explaining how I was thinking then are a bit dated, the syntax changed a bit since then. Now I’d say I like the GP API mostly for clarity when reading the source code and PyMC3 model code, and ease of adding new GP implementations, at the expense of some code duplication. Each implementation is then completely separate and isolated, and it’s clear how they were done (I hope!) so bugs or problems can be found easily.

I’d argue that whether or not Marginal or Latent is used isn’t just whether or not y is observed, but also whether the GP is needed anywhere else. In Marginal the likelihood and GP prior have to be conjugate, and the GP isn’t needed anywhere else in the model, and you don’t want to see it in the trace, so it can be safely integrated out. Latent is for when you don’t want to do that. In PyMC3 it’s really easy to implement stuff like this, where Latent is really important.

Also, some more specialized GP implementations or approximations may not have both a Marginal and Latent version, like say VFE. The separate naming for each GP implementation follows other packages like GPFlow and GPy, but is mapped to be more PyMC3ish, so each implementation is a constructor for a named, symbolic random variable with a logp and random method. So then Marginal and Latent should be named differently because they are different random variables (even if they both have MvNormal pdfs).

I started out having just one GP class that would parse arguments to infer what implementation to use, but the arg parsing logic got out of hand pretty quickly, and adding new implementations was very difficult. It also doesn’t really work well, since with how PyMC3 is designed, the GP prior and conditional really need to be represented by different random variables. The inputs for conditional aren’t necessarily known when NUTS is run, and we don’t always have to use NUTS for those anyway.

While I’m rambling, one thing that does bug me though is stuff like here. I still not 100% sure there would never be a reason for having an unobserved MarginalGP… It’s certainly not very useful often, but I didn’t want to block that as a possibility so I put that there just to be safe.


#4

Thank you both very much for the helpful answers! I definitely better understand the reasons for splitting the GPs now, and it seems like the right call. Indeed, the is_observed argument always confused me, but it doesn’t hurt to have it!

I think I’ll go ahead and raise an issue about the cov arguments so we can figure out the right course of action here. I might get around to a PR for MarginalTP soon too.


#5

Sure! Of course can’t be sure that it’s the best way to go, so suggestions are always welcome. Glad that helped. Thinking about it more I’m pretty sure the is_observed thing can go, since a symbol for the gp in Marginal is marginalized out so it can never be in the graph or show up in the trace…