Gaussian Processes: Compositional Data

brontidon · March 11, 2025, 10:09pm

Hello,

Wanted to reach out and see if any community members have attempted to apply GPs to compositional date of the form Nxd, where d is the dimension of counts and N is the number of observations. Here N_{i} would be the count of observations at time.

To me, this would be most accurately be described as a hierarchical model where we place a Dirichlet prior on a multinomial response:
y_i~Mutl(M_i,p_i)
where y_i is a corresponding row vector in the Nxd data matrix, and M_i and p_i are the corresponding counts and proportions

I am under the impression that i should use a GP to model the mean of the Dirichlet vector at a given time t. So I would have

(mu_alpha)  ~ GP(0,K)

where mu_alpha is the mean of my dirichlet vector, k is my spatio-temporal vector (I guess for the sake of understanding we can assume space and time kernels are seperable)

This paper seems to outline a similar procedure, however pages 6-7 with some relevant mathematical details are a little ambiguous as to what is happening (for instance, the vector mu on in the link function seems to come out of nowhere): https://arxiv.org/pdf/1903.05036. I do not have a geostats background, so perhaps there is also some terminology i’m missing.

In terms of trying to get on the right track: is my idea correct? Is it practical given that we may not be able to use some hierarchical gaussian process approximations? is there perhaps a more direct way of going about this?

Thanks!

twiecki · March 15, 2025, 5:31am

I fed your question into our PyMC-bot, here’s its answer:

Applying Gaussian Processes to Compositional Data

Based on your question about applying Gaussian Processes (GPs) to compositional data, I can provide some guidance on your approach and implementation considerations in PyMC.

Your Proposed Approach

Your idea of using a hierarchical model with a Dirichlet prior on a multinomial response is a reasonable approach for compositional data. The structure you’ve outlined:

y_i ~ Mult(M_i, p_i)

Where you want to use a GP to model the mean of the Dirichlet vector at a given time t:

(mu_alpha) ~ GP(0, K)

This approach makes sense conceptually for modeling compositional time series data [1].

Implementation Considerations

1. Hierarchical Structure

Your hierarchical structure is similar to what’s shown in Document 1, where you’d use a GP to model the parameters of a Dirichlet distribution. This is a valid approach for compositional data that preserves the simplex constraint (proportions sum to 1) [1].

2. GP Approximation Methods

For practical implementation, you might want to consider using the Hilbert Space Gaussian Process (HSGP) approximation mentioned in Document 2. This would be particularly useful if:

Your dataset is large (many time points)
You need faster computation than a full GP would allow
Your input dimension is low (1D for time or 2D for space-time) [2]

The HSGP approximation reduces computational complexity from O(n³) to O(mn + m), where n is the number of data points and m is the number of basis vectors [2].

3. Link Function

For the link function between the GP and the Dirichlet parameters, you’ll need to ensure the parameters remain positive. A common approach is to use an exponential or softmax transformation of the GP outputs to maintain the constraints of the Dirichlet distribution [1].

4. Multiple GPs for Multiple Dimensions

Since your compositional data has d dimensions, you could model this using multiple GPs with a shared covariance structure. Document 3 mentions that you can have “multiple GPs in the same problem” and use ARD (Automatic Relevance Determination) for multidimensional inputs [3].

Implementation in PyMC

Here’s a sketch of how you might implement this in PyMC:

import pymc as pm
import numpy as np

# Assuming X is your input (time points) and Y is your compositional data
with pm.Model() as model:
    # GP hyperparameters
    length_scale = pm.InverseGamma("length_scale", alpha=2.0, beta=1.0)
    eta = pm.HalfNormal("eta", sigma=1.0)
    
    # Define covariance function
    cov_func = eta**2 * pm.gp.cov.Matern52(1, ls=length_scale)
    
    # For each dimension of your compositional data (except one)
    # We model d-1 dimensions since the last is determined by the constraint
    gps = []
    for i in range(d-1):
        gp = pm.gp.Latent(cov_func=cov_func)
        # Place GP prior
        f = gp.prior(f"f_{i}", X=X)
        # Transform GP output to ensure positivity
        alpha_i = pm.Deterministic(f"alpha_{i}", pm.math.exp(f))
        gps.append(alpha_i)
    
    # The last dimension is determined by constraint
    # Or you could model all dimensions and normalize
    
    # Combine to form Dirichlet parameters
    alpha = pm.math.stack(gps, axis=1)
    
    # Dirichlet-Multinomial likelihood
    for i in range(len(X)):
        pm.Multinomial(f"y_{i}", n=M[i], p=alpha[i]/alpha[i].sum(), observed=Y[i])

Practical Considerations

Separability: Your assumption about separable space and time kernels is valid and commonly used. This can be implemented by taking the product of a spatial kernel and a temporal kernel [2].
Approximation Methods: If computational efficiency is a concern, the HSGP approach from Document 2 would be beneficial. For very large datasets, you might also consider variational inference methods [2].
Data Containers: Be aware that there have been issues using Data Containers with GPs in older versions of PyMC, as mentioned in Document 4. However, this may be resolved in newer versions [4].

Conclusion

Your approach is theoretically sound. The main challenges will be in the practical implementation, particularly in ensuring computational efficiency and proper transformation of the GP outputs to maintain the constraints of the Dirichlet distribution.

For implementation, I would recommend starting with a simplified version (perhaps with synthetic data) to ensure the model structure works as expected before scaling to your full dataset.

brontidon · March 15, 2025, 7:24pm

Thank you for your reply, super helpful so far-esp some of the code here is really helpful for setting it up from a programming perspective, which is sometimes the hardest part!

From a heuristics perspective, i guess the biggest gap in my understanding is how to motivate the kmean vector of the dirichlet vector. Obviously I could just look at ‘training points’ and consider a mean vector that is reasonably close to the observed, define a kronecker product for the cov and then i could play with different lengthscales along time and space and evaluate the log liklilohood of the model, but i’m not sure if there is a more principled way of doing so.

I’m also a bit unsure how to implement this using HSGPs, as the mean of the dirichlet vector is not zero/stationary (the compositional data has a trend and some periodicity to it)-to my understanding this disqualifies an hsgp approach.

bwengals · March 18, 2025, 5:02am

RE your second paragraph, hard to say without trying things out. Probably just have to implement it, play with it, and see if you can get it to work. See here for a bit on Kronecker structure with HSGPs.

It’s OK to add an HSGP and a HSGPPeriodic, that should give you trend + periodicity.

brontidon · March 18, 2025, 10:03pm

Will do; i was thinking about modfying the sub block here:

## Kronecker GP
    eta = pm.Gamma("eta", 2, 2)
    Xs, m, c = [Xs_t, Xs_x], [m_t, m_x], [c_t, c_x]
    f = kronecker_HSGP(Xs, m, c, cov_t, cov_x)
    f = pm.Deterministic("f", eta * f)

to something like this

## additive Kronecker GP
    eta = pm.Gamma("eta", 2, 2)
    Xs, m, c = [Xs_t, Xs_x], [m_t, m_x], [c_t, c_x]
    f = kronecker_HSGP(Xs, m, c, cov_t, cov_x)
    f_period = kronecker_HSGP(Xs, m, c, cov_t, cov_x)
    f = pm.Deterministic("f", eta(f + f_period)

to define the additive process. do you think that is the smartest way? or should i define the addition within gp.latent class?

(i’m sometimes unsure when to use some of the Gaussian process classes, or just define a mv normal distribution as in the linked example. I understand part of it might be the case in which the likelihood of something we’re interested is not gaussian-we just use Gaussian process to model the parameters of the corresponding likelihood.

Still trying to really translate some of the math in the linked lectures and rassmusen to pymc implementations

bwengals · April 7, 2025, 7:40pm

No harm in trying both, but I think what you put here would work best. It’s fine to use MvNormals directly, especially for the training part. It gets annoying to do that for prediction. The real benefit of the GP classes is that they are ready to handle both the training part and the conditional / prediction part.

Topic		Replies	Views
Hierarchical Gaussian Process Questions	1	657	February 19, 2021
How to implement Hierarchical Gaussian Process in PyMC? version agnostic gaussian_process , hierarchical	4	3832	May 8, 2023
Hierarchical Time Series w/Gaussian Process: Any examples? Questions	3	2810	November 4, 2017
Dirichlet Gaussian Process Model - Suggestions for Improvement v5 modeling	0	251	January 18, 2023
Cox Process in PyMC3 Questions	16	4327	August 27, 2020