Hi,
I have longitudinal patient data including their age, sex, biomarkers, etc., and a health outcome. Simply put, I aim at predicting health through aging. I created a toy data set for this purpose.
import pandas as pd
data = pd.read_csv('https://gist.githubusercontent.com/essicolo/f31641cee96e855772d4ca4151bc8cdf/raw/48043931bbd619ef4992ed202e45c9a292c44b62/sim-data.csv')
data.head(10)
id | age | A | B | C | D | y | |
---|---|---|---|---|---|---|---|
0 | 1 | 15.409806 | -1.232430 | -0.209489 | 1 | 2 | -2.094019 |
1 | 1 | 22.575923 | -1.133331 | -0.627094 | 1 | 2 | -0.121604 |
2 | 1 | 35.444030 | -0.389777 | -0.639801 | 1 | 2 | -0.638262 |
3 | 1 | 47.760702 | 0.430589 | -0.242125 | 1 | 2 | 1.805872 |
4 | 1 | 59.752079 | 0.701488 | 0.394267 | 1 | 2 | 1.713786 |
5 | 1 | 72.536368 | 0.216110 | 0.994662 | 1 | 2 | 0.727969 |
6 | 2 | 13.453683 | -0.404368 | -0.635023 | 2 | 2 | -1.180632 |
7 | 2 | 24.466271 | -0.722798 | -0.675039 | 2 | 2 | -1.484378 |
8 | 2 | 36.678228 | -0.455428 | -0.237594 | 2 | 2 | -0.331654 |
9 | 2 | 46.896171 | 0.241015 | 0.458344 | 2 | 2 | 1.367726 |
I considered Gaussian processes with several strategies.
- Add a random effect on the mean function
- Add a random effect kernel
- Add a random effect to the GP
- Add a random effect GP
I don’t expect that the two firsts will affect the posterior that much, so I’m looking for a solution on the last twos. First I put everything to scale then extract information for X, y and the patients.
data_sc = (
data
.drop("id", axis = 1)
.apply(lambda x: (x-x.mean()) / x.std(), axis=0)
.assign(id = data.id)
)
X_sc = data_sc.drop(["id", "y"], axis=1).values
y_sc = data_sc.y.values
idx = data_sc.id.values - 1
n_idx = np.unique(idx).shape[0]
For strategy 3 (add a random effect to the GP), I created an unfitted GP Latent object, then added a random intercept to it to allow some additive variation from a patient to another (similarly to the radon example).
with pm.Model() as gp_model_ri:
# Random effect
mu_intercept = pm.Normal("mu_intercept", mu = 0, sigma = 5)
sigma_intercept = pm.HalfNormal("sigma_intercept", 5)
intercept = pm.Normal(
"intercept",
mu = mu_intercept,
sigma = sigma_intercept,
shape = n_idx
)
# GP
lenght_scale = pm.HalfNormal("lenght_scale", 5)
amplitude = pm.HalfNormal("amplitude", 5)
cov_func = amplitude * pm.gp.cov.ExpQuad(X_sc.shape[1], lenght_scale)
gp = pm.gp.Latent(cov_func)
gp_prior = gp.prior("gp_prior", X = X_sc)
# Outcome = Random intercept + GP
model = pm.Deterministic("model", intercept[idx] + gp_prior)
error = pm.HalfCauchy("error", 5)
y_hat = pm.Normal("y_hat", mu=model, sigma = error, observed = y_sc)
I fitted the model with minimal sampling, since life is short.
with gp_model_ri:
trace = pm.sample(500, tune=100, chains=2, return_inferencedata=True)
I’m not sure if the model was fitted as expected. I have no idea since I can’t predict anything.
gp.conditional("pred", X_sc)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
~/opt/miniconda3/envs/edge_env/lib/python3.9/site-packages/pymc3/model.py in get_context(cls, error_if_none)
320 try:
--> 321 candidate = cls.get_contexts()[-1] # type: Optional[T]
322 except IndexError as e:
IndexError: list index out of range
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
~/opt/miniconda3/envs/edge_env/lib/python3.9/site-packages/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
82 try:
---> 83 model = Model.get_context()
84 except TypeError:
~/opt/miniconda3/envs/edge_env/lib/python3.9/site-packages/pymc3/model.py in get_context(cls, error_if_none)
325 if error_if_none:
--> 326 raise TypeError("No %s on context stack" % str(cls))
327 return None
TypeError: No <class 'pymc3.model.Model'> on context stack
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
/var/folders/d7/lfnvgd_51ss9_xfx75hnf8s00000gn/T/ipykernel_74882/4101386285.py in <module>
----> 1 gp.conditional("pred", X_sc)
~/opt/miniconda3/envs/edge_env/lib/python3.9/site-packages/pymc3/gp/gp.py in conditional(self, name, Xnew, given, **kwargs)
233 mu, cov = self._build_conditional(Xnew, *givens)
234 shape = infer_shape(Xnew, kwargs.pop("shape", None))
--> 235 return pm.MvNormal(name, mu=mu, cov=cov, shape=shape, **kwargs)
236
237
~/opt/miniconda3/envs/edge_env/lib/python3.9/site-packages/pymc3/distributions/distribution.py in __new__(cls, name, *args, **kwargs)
83 model = Model.get_context()
84 except TypeError:
---> 85 raise TypeError(
86 "No model on context stack, which is needed to "
87 "instantiate distributions. Add variable inside "
TypeError: No model on context stack, which is needed to instantiate distributions. Add variable inside a 'with model:' block, or use the '.dist' syntax for a standalone distribution.
Can you tell me if my model is correct? If so, how can I predict new values from this model? Since the random intercept is not predictable for new data, is there a way to add the random intercept as noise?
Finally, if adding an extra GP for the random effect is conceivable, is it possible to add two gaussian processes in PyMC3?
Thanks!