Dirichlet Gaussian Process Model - Suggestions for Improvement

Thops · January 18, 2023, 10:10am

I recently became interested in practical applications of Gaussian Processes, specifically applying GPs to Multiclass Classification problems. My initial model had K-GPs for K classes, that are passed through the softmax function to become a probability distribution. I initially placed a Categorical prior over observations. I then got the idea to swap out the categorical, with a dirichlet. On paper, this should be a superior approach, since it can allow for the detection of very high uncertainty predictions. Interestingly this model can also express % mixture ratios, which are types of data I tend to encounter often in my work. My problem is, the above Dirichlet-GP model exhibits very pathological behaviors in real-world data, it generally always defaults to random guessing (i.e. all samples have approx \frac{1}{K} for each element). Testing this proposed model with Iris dataset yields reasonably good results. Applying it to a dataset of cultivar mixtures, results in the above behavior.
The dataset has around 13 columns representing different cultivars and each observation is a mixture of these (possibly pure) i.e. an observation has the form (0.7, 0.3, 0.0, \dots). These seem to have reasonably strong correlations. For then inputs its around 500x63 all continuous. For comparison’s sake, I trained a relatively simply ANN, that seems to be performing quite well. My models look like this, and the inputs are standardized (for Dirichlet, I also offset zeros and ones by some small perturbation \epsilon=1e^{-4}):

# Dirichlet GP general model
with pymc.Model(coords=coords) as cultivar_model:
    _,M=X_train.shape
    factors=Y_train.columns
    n_factors=len(factors)
    gps=[]
    latent_functions=[]
    lower=1e-4
    if NOISE:
        σ_noise = pymc.TruncatedNormal('σ_noise', mu=3.0, sigma=1.0, shape=n_factors, lower=lower)
    λ = pymc.TruncatedNormal('λ', mu=3.0, sigma=.8 , shape=(n_factors,M), lower=lower)
    η = pymc.TruncatedNormal('η', mu=10.0, sigma=3.5, shape=n_factors, lower=lower)
    for idx,factor in enumerate(factors):
        C=0
        μ=pymc.gp.mean.Constant(c=C)
        κ_predictive = η[idx]**2*pymc.gp.cov.ExpQuad(M, ls=λ[idx,:])
        if NOISE:    
            κ_noise = pymc.gp.cov.WhiteNoise(σ_noise[idx])
            κ = κ_noise+κ_predictive
        else:
            κ = κ_predictive
        gp = pymc.gp.Latent(mean_func=μ, cov_func=κ)
        f_raw = gp.prior(f'f_raw_{factor}', x_train_tensor, reparameterize=False)
        gps.append(gp)
        latent_functions.append(f_raw)
    f=pymc.Deterministic('f', pytensor.tensor.stack(*latent_functions), dims='cultivars' )
    α =pymc.Deterministic('α', pymc.math.exp(f).T )
    y_obs = pymc.Dirichlet('y_obs', observed = y_train_tensor, a=α, shape=Y_train.shape[0])with pymc.Model(coords=coords) as cultivar_model:
    _,M=X_train.shape
    factors=Y_train.columns
    n_factors=len(factors)
    gps=[]
    latent_functions=[]
    lower=1e-4
    if NOISE:
        σ_noise = pymc.TruncatedNormal('σ_noise', mu=3.0, sigma=1.0, shape=n_factors, lower=lower)
    λ = pymc.TruncatedNormal('λ', mu=3.0, sigma=.8 , shape=(n_factors,M), lower=lower)
    η = pymc.TruncatedNormal('η', mu=10.0, sigma=3.5, shape=n_factors, lower=lower)
    for idx,factor in enumerate(factors):
        C=0
        μ=pymc.gp.mean.Constant(c=C)
        κ_predictive = η[idx]**2*pymc.gp.cov.ExpQuad(M, ls=λ[idx,:])
        if NOISE:    
            κ_noise = pymc.gp.cov.WhiteNoise(σ_noise[idx])
            κ = κ_noise+κ_predictive
        else:
            κ = κ_predictive
        gp = pymc.gp.Latent(mean_func=μ, cov_func=κ)
        f_raw = gp.prior(f'f_raw_{factor}', x_train_tensor, reparameterize=False)
        gps.append(gp)
        latent_functions.append(f_raw)
    f=pymc.Deterministic('f', pytensor.tensor.stack(*latent_functions), dims='cultivars' )
    α =pymc.Deterministic('α', pymc.math.exp(f).T )
    y_obs = pymc.Dirichlet('y_obs', observed = y_train_tensor, a=α, shape=Y_train.shape[0])

# Categorical GP
with pymc.Model(coords=coords) as cultivar_model:
    _,M=X_train.shape
    factors=Y_train.columns
    n_factors=len(factors)
    gps=[]
    latent_functions=[]
    lower=1e-4
    if NOISE:
        σ_noise = pymc.TruncatedNormal('σ_noise', mu=3.0, sigma=1.0, shape=n_factors, lower=lower)
    λ = pymc.TruncatedNormal('λ', mu=3.0, sigma=.8 , shape=(n_factors,M), lower=lower)
    η = pymc.TruncatedNormal('η', mu=10.0, sigma=3.5, shape=n_factors, lower=lower)
    for idx,factor in enumerate(factors):
        C=0
        μ=pymc.gp.mean.Constant(c=C)
        κ_predictive = η[idx]**2*pymc.gp.cov.ExpQuad(M, ls=λ[idx,:])
        if NOISE:    
            κ_noise = pymc.gp.cov.WhiteNoise(σ_noise[idx])
            κ = κ_noise+κ_predictive
        else:
            κ = κ_predictive
        gp = pymc.gp.Latent(mean_func=μ, cov_func=κ)
        f_raw = gp.prior(f'f_raw_{factor}', x_train_tensor, reparameterize=False)
        gps.append(gp)
        latent_functions.append(f_raw)
    f=pymc.Deterministic('f', pytensor.tensor.stack(*latent_functions), dims='cultivars' )
    α =pymc.Deterministic('α', pytensor.tensor.nnet.softmax(f, axis=0).T )
    y_obs = pymc.Dirichlet('y_obs', observed = y_train_tensor, a=α, shape=Y_train.shape[0])

Any ideas why this isn’t working? Is it something with my prior perhaps? I’ll note MCMC is pretty fast, but posterior sampling is not, it almost takes 2 hours.

Topic		Replies	Views
Mixture Model Dirichlet Questions	7	3026	June 1, 2018
Vector Output GP's v5 gaussian_process	4	482	November 28, 2022
Dirichlet Mixture model Questions	3	1038	June 4, 2018
Multiple Dirichlet Processes in a model Questions	1	603	June 26, 2020
Posterior to Prior in Categorical distribution (or encapsulating multiple data sources to categorical analysis) Questions modeling	4	1783	February 28, 2022

Dirichlet Gaussian Process Model - Suggestions for Improvement

Related topics