Modeling longitudinal data among patients by coupling gaussian processes with random effects

Hi,

I have longitudinal patient data including their age, sex, biomarkers, etc., and a health outcome. Simply put, I aim at predicting health through aging. I created a toy data set for this purpose.

import pandas as pd
data = pd.read_csv('https://gist.githubusercontent.com/essicolo/f31641cee96e855772d4ca4151bc8cdf/raw/48043931bbd619ef4992ed202e45c9a292c44b62/sim-data.csv')
data.head(10)
id age A B C D y
0 1 15.409806 -1.232430 -0.209489 1 2 -2.094019
1 1 22.575923 -1.133331 -0.627094 1 2 -0.121604
2 1 35.444030 -0.389777 -0.639801 1 2 -0.638262
3 1 47.760702 0.430589 -0.242125 1 2 1.805872
4 1 59.752079 0.701488 0.394267 1 2 1.713786
5 1 72.536368 0.216110 0.994662 1 2 0.727969
6 2 13.453683 -0.404368 -0.635023 2 2 -1.180632
7 2 24.466271 -0.722798 -0.675039 2 2 -1.484378
8 2 36.678228 -0.455428 -0.237594 2 2 -0.331654
9 2 46.896171 0.241015 0.458344 2 2 1.367726

I considered Gaussian processes with several strategies.

  1. Add a random effect on the mean function
  2. Add a random effect kernel
  3. Add a random effect to the GP
  4. Add a random effect GP

I don’t expect that the two firsts will affect the posterior that much, so I’m looking for a solution on the last twos. First I put everything to scale then extract information for X, y and the patients.

data_sc = (
    data
    .drop("id", axis = 1)
    .apply(lambda x: (x-x.mean()) / x.std(), axis=0)
    .assign(id = data.id)
) 

X_sc = data_sc.drop(["id", "y"], axis=1).values
y_sc = data_sc.y.values
idx = data_sc.id.values - 1
n_idx = np.unique(idx).shape[0]

For strategy 3 (add a random effect to the GP), I created an unfitted GP Latent object, then added a random intercept to it to allow some additive variation from a patient to another (similarly to the radon example).

with pm.Model() as gp_model_ri:
    # Random effect
    mu_intercept = pm.Normal("mu_intercept", mu = 0, sigma = 5)
    sigma_intercept = pm.HalfNormal("sigma_intercept", 5)
    intercept = pm.Normal(
        "intercept",
        mu = mu_intercept,
        sigma = sigma_intercept,
        shape = n_idx
    )
    
    # GP
    lenght_scale = pm.HalfNormal("lenght_scale", 5)
    amplitude = pm.HalfNormal("amplitude", 5)
    cov_func = amplitude * pm.gp.cov.ExpQuad(X_sc.shape[1], lenght_scale)
    gp = pm.gp.Latent(cov_func)
    gp_prior = gp.prior("gp_prior", X = X_sc)

    # Outcome = Random intercept + GP
    model = pm.Deterministic("model", intercept[idx] + gp_prior)
    error = pm.HalfCauchy("error", 5)
    y_hat = pm.Normal("y_hat", mu=model, sigma = error, observed = y_sc)

I fitted the model with minimal sampling, since life is short.

with gp_model_ri:
    trace = pm.sample(500, tune=100, chains=2, return_inferencedata=True)

I’m not sure if the model was fitted as expected. I have no idea since I can’t predict anything.

gp.conditional("pred", X_sc)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
~/opt/miniconda3/envs/edge_env/lib/python3.9/site-packages/pymc3/model.py in get_context(cls, error_if_none)
    320         try:
--> 321             candidate = cls.get_contexts()[-1]  # type: Optional[T]
    322         except IndexError as e:

IndexError: list index out of range

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
~/opt/miniconda3/envs/edge_env/lib/python3.9/site-packages/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
     82         try:
---> 83             model = Model.get_context()
     84         except TypeError:

~/opt/miniconda3/envs/edge_env/lib/python3.9/site-packages/pymc3/model.py in get_context(cls, error_if_none)
    325             if error_if_none:
--> 326                 raise TypeError("No %s on context stack" % str(cls))
    327             return None

TypeError: No <class 'pymc3.model.Model'> on context stack

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
/var/folders/d7/lfnvgd_51ss9_xfx75hnf8s00000gn/T/ipykernel_74882/4101386285.py in <module>
----> 1 gp.conditional("pred", X_sc)

~/opt/miniconda3/envs/edge_env/lib/python3.9/site-packages/pymc3/gp/gp.py in conditional(self, name, Xnew, given, **kwargs)
    233         mu, cov = self._build_conditional(Xnew, *givens)
    234         shape = infer_shape(Xnew, kwargs.pop("shape", None))
--> 235         return pm.MvNormal(name, mu=mu, cov=cov, shape=shape, **kwargs)
    236 
    237 

~/opt/miniconda3/envs/edge_env/lib/python3.9/site-packages/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
     83             model = Model.get_context()
     84         except TypeError:
---> 85             raise TypeError(
     86                 "No model on context stack, which is needed to "
     87                 "instantiate distributions. Add variable inside "

TypeError: No model on context stack, which is needed to instantiate distributions. Add variable inside a 'with model:' block, or use the '.dist' syntax for a standalone distribution.

Can you tell me if my model is correct? If so, how can I predict new values from this model? Since the random intercept is not predictable for new data, is there a way to add the random intercept as noise?

Finally, if adding an extra GP for the random effect is conceivable, is it possible to add two gaussian processes in PyMC3?

Thanks!

I can take a shot at answering some of your questions. For some good examples of GPs check out the example notebooks, particularly the Mauna Loa example. It looks like you’re using the Latent implementation so also check out this example as well.

  1. That error is telling you that you need to use .conditional inside of the model context, so something like this:
with gp_model_ri:
    gp.conditional("pred", X_sc)
  1. Yes, it is possible to add two Gaussian Processes in PyMC. This is just as simple as adding (+) the two GP objects together, then you just treat the result as another GP.
  2. As for making predictions with GPs, you’ll probably want to use .conditional with .sample_posterior_predictive. This will look something like this (see the second example I linked to above):
# 200 new values from x=0 to x=15
n_new = 200
X_new = np.linspace(0, 15, n_new)[:, None]

# add the GP conditional to the model, given the new X values
with model:
    f_pred = gp.conditional("f_pred", X_new)

# Sample from the GP conditional distribution
with model:
    pred_samples = pm.sample_posterior_predictive(trace.posterior, vars=[f_pred])
  1. I’m not too sure what the best way to handle predicting new data is since your intercept depends on a class (id = 1 or 2), hopefully someone else has a good idea. My idea would be to set up a discrete distribution with Binomial/Multinomial depending on how many classes you have. The point of this distribution is to model the likelihood of seeing observations belonging to each class. Then, make this RV observed by passing in something like observed=idx as a kwarg to your Binomial/Multinomial. Next, map the classes somehow to the appropriate intercept term contained in intercept, similar to what you’ve done with intercept[idx] towards the end of your code (except here idx is your variable for Binomial/Multinomial). My explanation of this is a bit messy but maybe this helps? I think this would make it such that when you call pm.sample_posterior_predictive it samples a random class for each observation and then grabs the appropriate intercept from intercept.
1 Like

Hi,

I’ve put the .conditional in the context, then I could indeed use the index as a random variable (I used pm.Categorical). And it works, though I’m still figuring out how to correctly interpret the outcome.

n_new = 10
n_vars = X_sc.shape[1]
X_new = np.random.normal(0, 1, (n_new, n_vars))

with pm.Model() as model:
    
    # Random effect
    mean_intercept = pm.Normal("mean_intercept", mu = 0, sigma = 5)
    std_intercept = pm.HalfNormal("std_intercept", 5)
    
    # intercept
    intercept = pm.Normal(
        "intercept",
        mu = mean_intercept,
        sigma = std_intercept,
        shape = n_idx
    )
    
    # GP
    length_scale = pm.Gamma("length_scale", alpha=2, beta=1)
    amplitude = pm.Gamma("amplitude", alpha=2, beta=1)

    cov = amplitude ** 2 * pm.gp.cov.Matern52(n_vars, length_scale)
    gp = pm.gp.Latent(cov_func = cov)
    gp_prior = gp.prior("gp_prior", X = X_sc)

    # index as a distribution
    idx_d = pm.Categorical('idx_d', p=np.repeat(1, n_idx)/n_idx, observed=idx)
    
    # Random GP
    σ = pm.HalfCauchy("σ", beta=5)
    gp_ri = pm.Deterministic("gp_ri", intercept[idx_d] + gp_prior)
    y_hat = pm.Normal("y_hat", mu = gp_ri, sigma = σ, observed=y_sc)

    # Sample
    trace = pm.sample(1000, tune=1000, chains=2, cores=2, return_inferencedata=True)

    # Predict
    gp_pred = gp.conditional("gp_pred", X_sc) # X_new)
    pred_samples = pm.sample_posterior_predictive(
        trace = trace.posterior,
        var_names = ["gp_pred", "y_hat"],
        samples = 500
    )

Thanks for the answer! I’ll try adding GPs for the next round.

1 Like

Hi @jlindbloom , I have another question here. I inserted the random effect on the mean function, such as

with pm.Model() as model:
    # Random effect
    mean_intercept = pm.Normal("mean_intercept", mu = 0, sigma = 5)
    std_intercept = pm.HalfNormal("std_intercept", 5)
    
    # random mean function
    constant_mean = pm.Normal(
        "constant_mean",
        mu = mean_intercept,
        sigma = std_intercept,
        shape = n_idx
    )
    
    # index as a distribution
    idx_d = pm.Categorical('idx_d', p=np.repeat(1, n_idx)/n_idx, observed=idx)
    
    # GP
    length_scale = pm.Gamma("length_scale", alpha=2, beta=1)
    amplitude = pm.HalfCauchy("amplitude", beta=5)
    cov_f = amplitude ** 2 * pm.gp.cov.Matern52(n_vars, length_scale)
    mean_f = pm.gp.mean.Constant(constant_mean[idx_d])
    gp = pm.gp.Marginal(mean_func = mean_f, cov_func = cov_f)

    # noise
    σ = pm.HalfCauchy("σ", beta=5)
    
    # ML
    y_ = gp.marginal_likelihood("y", X=X_sc, y=y_sc, noise=σ)
    
    # Fit (poor results with pm.find_MAP)
    trace = pm.sample(1000, tune=100, chains=2, cores=2, return_inferencedata=True)
    
    # Pred
    f_pred = gp.conditional("f_pred", X_sc)
    pred_samples = pm.sample_posterior_predictive(trace.posterior, var_names=["f_pred"], samples=200)

The prediction is based on the X_sc matrix. I was wondering how the observed ID (idx) is included in the prediction? Is it randomly sampled from the pm.Categoricaldistribution? How would you generate predictions from a known idx?

Thanks!

As a separate approach, you may want to consider using a coregionalized GP, as shown here. I think this is essentially what you’re after with “add a random effect GP”. This creates a GP with a vector output, one element for each class, where the outputs are correlated with one another and the degree of correlation is learned during sampling/optimization. The downside of this approach is it can be a bit overconfident/overfit, but otherwise it performs well.

4 Likes

Good idea! This is how it can be done, for the record.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pymc3 as pm

# Collect data
data = pd.read_csv('https://gist.githubusercontent.com/essicolo/f31641cee96e855772d4ca4151bc8cdf/raw/48043931bbd619ef4992ed202e45c9a292c44b62/sim-data.csv')

# Scale data
data_sc = (
    data
    .drop("id", axis = 1)
    .apply(lambda x: (x-x.mean()) / x.std(), axis=0)
    .assign(id = data.id)
) 

# Separate X, y and the index
X_sc = data_sc.drop(["id", "y"], axis=1).values
y_sc = data_sc.y.values
idx = data_sc.id.values - 1 # index of the patient (-1 so that first is 0, not 1)
n_idx = np.unique(idx).shape[0] # number of patients
n_vars = X_sc.shape[1]

# For coregionalization, the features and the IDs must be merged
Xsc_idx = np.concatenate((X_sc, idx.reshape(-1, 1)), axis=1)
active_idx = Xsc_idx.shape[1] - 1 # returns the index of the last column, where the id is

# The PyMC3 model
with pm.Model() as model:
    # GP
    ## Feature covariance
    length_scale = pm.Gamma("length_scale", alpha=2, beta=1)
    amplitude = pm.HalfCauchy("amplitude", beta=5)
    cov_features = amplitude ** 2 * pm.gp.cov.Matern52(
        input_dim = Xsc_idx.shape[1],
        ls = length_scale,
        active_dims=np.arange(0, n_vars) # all except index
    )

    ## Coregion covariance
    W = pm.Normal(
        "W", mu = 0, sd = 3,
        shape = (n_idx, n_idx),
        testval = np.random.randn(n_idx, n_idx)
    )
    kappa = pm.Gamma("kappa", alpha = 1.5, beta = 1, shape = n_idx)
    coreg = pm.gp.cov.Coregion(
        input_dim = Xsc_idx.shape[1],
        active_dims = [active_idx], # only index
        kappa = kappa,
        W = W
    )
    
    ## Combined covariance
    cov_f = coreg * cov_features

    ## GP noise
    σ = pm.HalfCauchy("σ", beta=5)

    ## Gaussian process
    gp = pm.gp.Marginal(cov_func = cov_f)

    ## Marginal likelihood
    y_ = gp.marginal_likelihood("y", X=Xsc_idx, y=y_sc, noise=σ)
    
    # Fit
    mp = pm.find_MAP()
    
    # Predict
    f_pred = gp.conditional("f_pred", Xsc_idx)
    pred_samples = pm.sample_posterior_predictive([mp], var_names=["f_pred"], samples=200)
    
# Results
## Mean
yhat_samples = pred_samples['f_pred'].mean(axis=0)

## Standard-deviation
yhat_samples_std = pred_samples['f_pred'].std(axis=0)

## Mean and std back to original scale
yhat_samples_o = yhat_samples * data.y.std() + data.y.mean()
yhat_samples_std_o = yhat_samples_std * data.y.std()

## Plot
plt.figure(figsize=(6, 6)) 
plt.errorbar(data.y.values, yhat_samples_o, yerr=yhat_samples_std_o, fmt=".", alpha = 0.5)
plt.plot((-3, 3), (-3, 3))
plt.xlabel("Observed")
plt.ylabel("Predicted")
plt.gca().set_aspect('equal')

2 Likes