# Modeling longitudinal data among patients by coupling gaussian processes with random effects

Hi,

I have longitudinal patient data including their age, sex, biomarkers, etc., and a health outcome. Simply put, I aim at predicting health through aging. I created a toy data set for this purpose.

``````import pandas as pd
``````
id age A B C D y
0 1 15.409806 -1.232430 -0.209489 1 2 -2.094019
1 1 22.575923 -1.133331 -0.627094 1 2 -0.121604
2 1 35.444030 -0.389777 -0.639801 1 2 -0.638262
3 1 47.760702 0.430589 -0.242125 1 2 1.805872
4 1 59.752079 0.701488 0.394267 1 2 1.713786
5 1 72.536368 0.216110 0.994662 1 2 0.727969
6 2 13.453683 -0.404368 -0.635023 2 2 -1.180632
7 2 24.466271 -0.722798 -0.675039 2 2 -1.484378
8 2 36.678228 -0.455428 -0.237594 2 2 -0.331654
9 2 46.896171 0.241015 0.458344 2 2 1.367726

I considered Gaussian processes with several strategies.

1. Add a random effect on the mean function
2. Add a random effect kernel
3. Add a random effect to the GP
4. Add a random effect GP

I don’t expect that the two firsts will affect the posterior that much, so I’m looking for a solution on the last twos. First I put everything to scale then extract information for X, y and the patients.

``````data_sc = (
data
.drop("id", axis = 1)
.apply(lambda x: (x-x.mean()) / x.std(), axis=0)
.assign(id = data.id)
)

X_sc = data_sc.drop(["id", "y"], axis=1).values
y_sc = data_sc.y.values
idx = data_sc.id.values - 1
n_idx = np.unique(idx).shape[0]
``````

For strategy 3 (add a random effect to the GP), I created an unfitted GP Latent object, then added a random intercept to it to allow some additive variation from a patient to another (similarly to the radon example).

``````with pm.Model() as gp_model_ri:
# Random effect
mu_intercept = pm.Normal("mu_intercept", mu = 0, sigma = 5)
sigma_intercept = pm.HalfNormal("sigma_intercept", 5)
intercept = pm.Normal(
"intercept",
mu = mu_intercept,
sigma = sigma_intercept,
shape = n_idx
)

# GP
lenght_scale = pm.HalfNormal("lenght_scale", 5)
amplitude = pm.HalfNormal("amplitude", 5)
cov_func = amplitude * pm.gp.cov.ExpQuad(X_sc.shape[1], lenght_scale)
gp = pm.gp.Latent(cov_func)
gp_prior = gp.prior("gp_prior", X = X_sc)

# Outcome = Random intercept + GP
model = pm.Deterministic("model", intercept[idx] + gp_prior)
error = pm.HalfCauchy("error", 5)
y_hat = pm.Normal("y_hat", mu=model, sigma = error, observed = y_sc)
``````

I fitted the model with minimal sampling, since life is short.

``````with gp_model_ri:
trace = pm.sample(500, tune=100, chains=2, return_inferencedata=True)
``````

I’m not sure if the model was fitted as expected. I have no idea since I can’t predict anything.

``````gp.conditional("pred", X_sc)
``````
``````---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
~/opt/miniconda3/envs/edge_env/lib/python3.9/site-packages/pymc3/model.py in get_context(cls, error_if_none)
320         try:
--> 321             candidate = cls.get_contexts()[-1]  # type: Optional[T]
322         except IndexError as e:

IndexError: list index out of range

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
~/opt/miniconda3/envs/edge_env/lib/python3.9/site-packages/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
82         try:
---> 83             model = Model.get_context()
84         except TypeError:

~/opt/miniconda3/envs/edge_env/lib/python3.9/site-packages/pymc3/model.py in get_context(cls, error_if_none)
325             if error_if_none:
--> 326                 raise TypeError("No %s on context stack" % str(cls))
327             return None

TypeError: No <class 'pymc3.model.Model'> on context stack

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
/var/folders/d7/lfnvgd_51ss9_xfx75hnf8s00000gn/T/ipykernel_74882/4101386285.py in <module>
----> 1 gp.conditional("pred", X_sc)

~/opt/miniconda3/envs/edge_env/lib/python3.9/site-packages/pymc3/gp/gp.py in conditional(self, name, Xnew, given, **kwargs)
233         mu, cov = self._build_conditional(Xnew, *givens)
234         shape = infer_shape(Xnew, kwargs.pop("shape", None))
--> 235         return pm.MvNormal(name, mu=mu, cov=cov, shape=shape, **kwargs)
236
237

~/opt/miniconda3/envs/edge_env/lib/python3.9/site-packages/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
83             model = Model.get_context()
84         except TypeError:
---> 85             raise TypeError(
86                 "No model on context stack, which is needed to "
87                 "instantiate distributions. Add variable inside "

TypeError: No model on context stack, which is needed to instantiate distributions. Add variable inside a 'with model:' block, or use the '.dist' syntax for a standalone distribution.
``````

Can you tell me if my model is correct? If so, how can I predict new values from this model? Since the random intercept is not predictable for new data, is there a way to add the random intercept as noise?

Finally, if adding an extra GP for the random effect is conceivable, is it possible to add two gaussian processes in PyMC3?

Thanks!

I can take a shot at answering some of your questions. For some good examples of GPs check out the example notebooks, particularly the Mauna Loa example. It looks like you’re using the `Latent` implementation so also check out this example as well.

1. That error is telling you that you need to use `.conditional` inside of the model context, so something like this:
``````with gp_model_ri:
gp.conditional("pred", X_sc)
``````
1. Yes, it is possible to add two Gaussian Processes in PyMC. This is just as simple as adding (+) the two GP objects together, then you just treat the result as another GP.
2. As for making predictions with GPs, you’ll probably want to use `.conditional` with `.sample_posterior_predictive`. This will look something like this (see the second example I linked to above):
``````# 200 new values from x=0 to x=15
n_new = 200
X_new = np.linspace(0, 15, n_new)[:, None]

# add the GP conditional to the model, given the new X values
with model:
f_pred = gp.conditional("f_pred", X_new)

# Sample from the GP conditional distribution
with model:
pred_samples = pm.sample_posterior_predictive(trace.posterior, vars=[f_pred])
``````
1. I’m not too sure what the best way to handle predicting new data is since your intercept depends on a class (`id` = 1 or 2), hopefully someone else has a good idea. My idea would be to set up a discrete distribution with `Binomial`/`Multinomial` depending on how many classes you have. The point of this distribution is to model the likelihood of seeing observations belonging to each class. Then, make this RV observed by passing in something like `observed=idx` as a kwarg to your `Binomial`/`Multinomial`. Next, map the classes somehow to the appropriate intercept term contained in `intercept`, similar to what you’ve done with `intercept[idx]` towards the end of your code (except here `idx` is your variable for `Binomial`/`Multinomial`). My explanation of this is a bit messy but maybe this helps? I think this would make it such that when you call `pm.sample_posterior_predictive` it samples a random class for each observation and then grabs the appropriate intercept from `intercept`.
1 Like

Hi,

I’ve put the `.conditional` in the context, then I could indeed use the index as a random variable (I used `pm.Categorical`). And it works, though I’m still figuring out how to correctly interpret the outcome.

``````n_new = 10
n_vars = X_sc.shape[1]
X_new = np.random.normal(0, 1, (n_new, n_vars))

with pm.Model() as model:

# Random effect
mean_intercept = pm.Normal("mean_intercept", mu = 0, sigma = 5)
std_intercept = pm.HalfNormal("std_intercept", 5)

# intercept
intercept = pm.Normal(
"intercept",
mu = mean_intercept,
sigma = std_intercept,
shape = n_idx
)

# GP
length_scale = pm.Gamma("length_scale", alpha=2, beta=1)
amplitude = pm.Gamma("amplitude", alpha=2, beta=1)

cov = amplitude ** 2 * pm.gp.cov.Matern52(n_vars, length_scale)
gp = pm.gp.Latent(cov_func = cov)
gp_prior = gp.prior("gp_prior", X = X_sc)

# index as a distribution
idx_d = pm.Categorical('idx_d', p=np.repeat(1, n_idx)/n_idx, observed=idx)

# Random GP
σ = pm.HalfCauchy("σ", beta=5)
gp_ri = pm.Deterministic("gp_ri", intercept[idx_d] + gp_prior)
y_hat = pm.Normal("y_hat", mu = gp_ri, sigma = σ, observed=y_sc)

# Sample
trace = pm.sample(1000, tune=1000, chains=2, cores=2, return_inferencedata=True)

# Predict
gp_pred = gp.conditional("gp_pred", X_sc) # X_new)
pred_samples = pm.sample_posterior_predictive(
trace = trace.posterior,
var_names = ["gp_pred", "y_hat"],
samples = 500
)
``````

Thanks for the answer! I’ll try adding GPs for the next round.

1 Like

Hi @jlindbloom , I have another question here. I inserted the random effect on the mean function, such as

``````with pm.Model() as model:
# Random effect
mean_intercept = pm.Normal("mean_intercept", mu = 0, sigma = 5)
std_intercept = pm.HalfNormal("std_intercept", 5)

# random mean function
constant_mean = pm.Normal(
"constant_mean",
mu = mean_intercept,
sigma = std_intercept,
shape = n_idx
)

# index as a distribution
idx_d = pm.Categorical('idx_d', p=np.repeat(1, n_idx)/n_idx, observed=idx)

# GP
length_scale = pm.Gamma("length_scale", alpha=2, beta=1)
amplitude = pm.HalfCauchy("amplitude", beta=5)
cov_f = amplitude ** 2 * pm.gp.cov.Matern52(n_vars, length_scale)
mean_f = pm.gp.mean.Constant(constant_mean[idx_d])
gp = pm.gp.Marginal(mean_func = mean_f, cov_func = cov_f)

# noise
σ = pm.HalfCauchy("σ", beta=5)

# ML
y_ = gp.marginal_likelihood("y", X=X_sc, y=y_sc, noise=σ)

# Fit (poor results with pm.find_MAP)
trace = pm.sample(1000, tune=100, chains=2, cores=2, return_inferencedata=True)

# Pred
f_pred = gp.conditional("f_pred", X_sc)
pred_samples = pm.sample_posterior_predictive(trace.posterior, var_names=["f_pred"], samples=200)
``````

The prediction is based on the `X_sc` matrix. I was wondering how the observed ID (`idx`) is included in the prediction? Is it randomly sampled from the `pm.Categorical`distribution? How would you generate predictions from a known `idx`?

Thanks!

As a separate approach, you may want to consider using a coregionalized GP, as shown here. I think this is essentially what you’re after with “add a random effect GP”. This creates a GP with a vector output, one element for each class, where the outputs are correlated with one another and the degree of correlation is learned during sampling/optimization. The downside of this approach is it can be a bit overconfident/overfit, but otherwise it performs well.

4 Likes

Good idea! This is how it can be done, for the record.

``````import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import pymc3 as pm

# Collect data

# Scale data
data_sc = (
data
.drop("id", axis = 1)
.apply(lambda x: (x-x.mean()) / x.std(), axis=0)
.assign(id = data.id)
)

# Separate X, y and the index
X_sc = data_sc.drop(["id", "y"], axis=1).values
y_sc = data_sc.y.values
idx = data_sc.id.values - 1 # index of the patient (-1 so that first is 0, not 1)
n_idx = np.unique(idx).shape[0] # number of patients
n_vars = X_sc.shape[1]

# For coregionalization, the features and the IDs must be merged
Xsc_idx = np.concatenate((X_sc, idx.reshape(-1, 1)), axis=1)
active_idx = Xsc_idx.shape[1] - 1 # returns the index of the last column, where the id is

# The PyMC3 model
with pm.Model() as model:
# GP
## Feature covariance
length_scale = pm.Gamma("length_scale", alpha=2, beta=1)
amplitude = pm.HalfCauchy("amplitude", beta=5)
cov_features = amplitude ** 2 * pm.gp.cov.Matern52(
input_dim = Xsc_idx.shape[1],
ls = length_scale,
active_dims=np.arange(0, n_vars) # all except index
)

## Coregion covariance
W = pm.Normal(
"W", mu = 0, sd = 3,
shape = (n_idx, n_idx),
testval = np.random.randn(n_idx, n_idx)
)
kappa = pm.Gamma("kappa", alpha = 1.5, beta = 1, shape = n_idx)
coreg = pm.gp.cov.Coregion(
input_dim = Xsc_idx.shape[1],
active_dims = [active_idx], # only index
kappa = kappa,
W = W
)

## Combined covariance
cov_f = coreg * cov_features

## GP noise
σ = pm.HalfCauchy("σ", beta=5)

## Gaussian process
gp = pm.gp.Marginal(cov_func = cov_f)

## Marginal likelihood
y_ = gp.marginal_likelihood("y", X=Xsc_idx, y=y_sc, noise=σ)

# Fit
mp = pm.find_MAP()

# Predict
f_pred = gp.conditional("f_pred", Xsc_idx)
pred_samples = pm.sample_posterior_predictive([mp], var_names=["f_pred"], samples=200)

# Results
## Mean
yhat_samples = pred_samples['f_pred'].mean(axis=0)

## Standard-deviation
yhat_samples_std = pred_samples['f_pred'].std(axis=0)

## Mean and std back to original scale
yhat_samples_o = yhat_samples * data.y.std() + data.y.mean()
yhat_samples_std_o = yhat_samples_std * data.y.std()

## Plot
plt.figure(figsize=(6, 6))
plt.errorbar(data.y.values, yhat_samples_o, yerr=yhat_samples_std_o, fmt=".", alpha = 0.5)
plt.plot((-3, 3), (-3, 3))
plt.xlabel("Observed")
plt.ylabel("Predicted")
plt.gca().set_aspect('equal')

``````
2 Likes