Hierarchical Time Series w/Gaussian Process: Any examples?

I have also recently been struggling with this. What I wound up doing, and I don’t know if this is at all the correct/best way to do it, is make a custom covariance function. However I am still struggling on the custom mean. Here is what I have for the covariance:

class IdCov(pm.gp.cov.Covariance):
    def __init__(self, input_dim, active_dims=None, sigma=1.0):
      super(IdCov, self).__init__(input_dim=input_dim, active_dims=active_dims)
      self.sigma = sigma

    def diag(self, X):
      return tt.alloc(tt.square(self.sigma), X.shape[0])

    def _dist(self, X, Xs):
      return tt.cast(tt.eq(tt.reshape(X, (-1, 1)), tt.reshape(Xs, (1, -1))), 'int8')

    def full(self, X, Xs=None):
      X, Xs = self._slice(X, Xs)
      if Xs is None:
        return tt.alloc(self.sigma * self._dist(X, X), X.shape[0], X.shape[0])
      else:
        return tt.alloc(self.sigma * self._dist(X, Xs), X.shape[0], Xs.shape[0])

You can take your two-dimensional data, where dimension 1 is a continuous feature and dimension 2 is the ID of your groups, and create a combined covariance kernel:

x1, x2 = np.meshgrid(np.linspace(0,1,10), np.arange(1,4))
X = np.concatenate((x1.reshape((30,1)), x2.reshape((30,1))), axis=1)

cov1 = pm.gp.cov.ExpQuad(2, 0.5, active_dims=[0])
cov2 = IdCov(2, sigma=1.0, active_dims=[1])
cov = cov1 * cov2

K = function([], cov(X))()
m = plt.imshow(K, cmap="inferno", interpolation='none'); plt.colorbar(m);

block_diag_kernel

However, I am struggling to develop a mean function that allows you to learn a separate mean for each group. If someone with more PyMC3 expertise could weigh in, that would be great!