# Gaussian Processes and Unusual Noise

I am trying to model a process, which looks a bit like a function, with heavy-tailed noise, and some values are also multiplied by a random value between 0 and 1. Consider the following example data.

``````rng = np.random.default_rng(4)
x = np.linspace(-3, 3, 50)
y = 5 + np.sin(x)
noise = 0.2 * rng.standard_t(df=1.5, size=x.shape[0])
# multiply 20% of the values by a random number from 0-1
eff = np.minimum(5 * rng.random(size=x.shape[0]), 1)
obs = (y + noise) * eff
``````

I can leverage `gp.Latent` and `StudentT` to capture the underlying function quite well. As per the following:

``````with pm.Model() as model:
# Underlying GP
ls = pm.Gamma("ls", mu=3, sigma=1)
scale = pm.Gamma("scale", mu=1, sigma=0.5)
cov = scale ** 2 * pm.gp.cov.ExpQuad(input_dim=1, ls=ls)
gp = pm.gp.Latent(cov_func=cov)
f = gp.prior("f", x[:, None])

# Noise
offset = pm.Normal("offset", mu=5, sigma=2)
nu = 1 + pm.Gamma("noise_nu", mu=1, sigma=1)
sigma = pm.HalfNormal("noise_sigma", sigma=1)
pm.StudentT("noisy_f", nu=nu, mu=offset + f, sigma=sigma, observed=obs)

# Sample
trace = pm.sample()

x_new = np.linspace(-4, 4, 100)
with model:
f_pred = gp.conditional("f_pred", x_new[:, None])
pm.StudentT("noisy_f_pred", nu=nu, mu=offset + f_pred, sigma=sigma)
ppc = pm.sample_posterior_predictive(trace, var_names=["f_pred", "noisy_f_pred"])
``````

However, though this approach captures the underlying function well, it overestimates the noise variance by quite a bit and obviously doesnâ€™t capture the random drops in â€śefficiencyâ€ť (i.e. the multiplications by 0-1)

How can I model this process more accurately?

Iâ€™ve tried a couple of things but they always lead to strange or completely wrong results. Iâ€™m interested in creating a generative model so I can sample other similar-looking data.

SIDE-NOTE: I am loving the new `HSGP` class and the ability to leverage `prior_linearized` to treat GPs like a regular linear model and get rapid inferenceâ€¦but didnâ€™t want to complicate my example above by including it.

1 Like

For example, I tried adding this after the definition of `f`, which does seem to work for the above i.e. the estimated noise variance is a little less, as expected.

``````eff = pm.Uniform("efficiency", lower=0, upper=5, size=x.shape)
f *= pm.math.clip(eff, 0, 1)
``````

For completeness, here is my `HSGP` implementationâ€¦which I am struggling to make converge.

``````with pm.Model() as model:
# Mutable input data
X_data = pm.MutableData("X", x[:, None])
obs_data = pm.MutableData("y", obs)

# Underlying GP
ls = pm.Gamma("ls", mu=3, sigma=1)
scale = pm.Gamma("scale", mu=1, sigma=0.5)
cov = scale ** 2 * pm.gp.cov.ExpQuad(input_dim=1, ls=ls)

gp = pm.gp.HSGP(m=[50], L=[4], cov_func=cov)
phi, sqrt_psd = gp.prior_linearized(X_data)  # already centred
beta = pm.Normal("beta", size=gp._m_star)
f = pm.Deterministic("f", phi @ (beta * sqrt_psd))

# Efficiency
eff = pm.Uniform("eff", lower=0, upper=5, size=X_data.shape[0])
f *= pm.math.clip(eff, 0, 1)

# Noise
offset = pm.Normal("offset", mu=5, sigma=2)
nu = 1 + pm.Gamma("noise_nu", mu=1, sigma=1)
sigma = pm.HalfNormal("noise_sigma", sigma=1)
pm.StudentT("noisy_f", nu=nu, mu=offset + f, sigma=sigma, observed=obs_data)

# Sample
trace = pm.sample(target_accept=0.98)

x_new = np.linspace(-4, 4, 100)
with model:
pm.set_data({"X": x_new[:, None], "y": np.zeros(x_new.shape)})
ppc = pm.sample_posterior_predictive(trace, var_names=["noisy_f"])
``````

However, I am still interested to hear if this is the right approach to this problem? Or how I might get the `HSGP` implementation to converge

The approach with `clip` is a neat idea, but unfortunately this is makes the posterior as seen by the sampler quite tricky. If the `eff` parameter is larger than 1 the derivative will be zero, and the sampler canâ€™t know that making it smaller will eventually change the logp. In general it is good to avoid any functions that donâ€™t have a well defined derivative everywhere in the parameter space, or that arenâ€™t analytic.

Effectively you have a discrete parameter for each of the observations, that tells us if that value needs to be scaled down. So you could code it along those lines:

``````scale_down = pm.Bernoulli("scale_down", p=0.2, dims="observation")
eff = pm.Uniform("eff", lower=0, upper=5, dims="observation")
mu = offset + scale_down * eff * f
``````

This is still not a great solution, because now we have discrete parameters in the model, and we canâ€™t use NUTS for those variables.
We can however do an extra step and marginalize out those variables. (I thought there was a tutorial about that somewhere in the docs, but I canâ€™t find anything right now. Does anyone else know a good one?)

Something like this should do it (I didnâ€™t test it, but hopefully you get the idea, if not please ask):

``````pm.Mixture(
"noisy_f",
[0.2, 0.8],
[
pm.StudentT.dist(nu=nu, mu=offset + eff * f, sigma=sigma),
pm.StudentT.dist(nu=nu, mu=offset + f, sigma=sigma)
],
observed=observed,
)
``````
1 Like

There are definitely examples that people have posted here (e.g., this old example), but Iâ€™m not sure there is a clean, simple example in the notebook gallery/documentation.

1 Like

https://www.pymc.io/projects/experimental/en/latest/generated/pymc_experimental.marginal_model.MarginalModel.html#pymc_experimental.marginal_model.MarginalModel

but in this case itâ€™s equivalent to the mixture

Back to clipping, Tensorflow has a sort of differentiable clip operation that could be interesting tfp.bijectors.SoftClip Â |Â  TensorFlow Probability

1 Like

This is very helpful. Thank you! Let me give it a go.

Ok, I tried this approach (including a prior on the efficiency drop ratio):

``````    # Efficiency
drop = pm.Uniform("drop", lower=0, upper=0.3)
eff = pm.Uniform("eff", lower=0, upper=1, size=f.shape)

# Likelihood
pm.Mixture(
"noisy_f",
[drop, 1 - drop],
[
pm.StudentT.dist(nu=nu, mu=offset + f * eff, sigma=sigma),
pm.StudentT.dist(nu=nu, mu=offset + f, sigma=sigma)
],
observed=obs_data,
)
``````

And this seems to work well for the standard GP case. However, again when I try for `HSGP` the model very quickly completely diverges with an rhat of inf.

Do you have any suggestions how I could address this? Maybe narrowing the priors?

I will note that whilst I canâ€™t get the `HSGP` implementation to work for this toy problem. The approach does seem to be working for the real problem Iâ€™m working on. So thanks a lot everyone! This was really instructive.