I have a data set of (x, y, z), height data. By eye, I can see that the height values follow some periodic pattern on top of some smoothly varying pattern.
I am using pymc 5.0.1
I want to be able to separate out the periodic pattern from the smooth pattern and I am hoping a gaussian process can help me do this.
Because of domain knowledge, I know that one thing is causing the smoothly varying portion of the data and another factor is causing the periodic portion of the data. I want to be able to look at them separately.
I have constructed a toy dataset to help illustrate what I am trying to do in my work.
import pymc as pm import numpy as np import arviz as az import matplotlib as mpl import matplotlib.pyplot as plt # toy dataset setup npoints = 50 bound = 10 xcoords = np.linspace(-bound, bound, npoints) ycoords = np.linspace(-bound, bound, npoints) xmesh, ymesh = np.meshgrid(xcoords, ycoords) # height data is a quadratic portion added to a periodic portion zmesh = (xmesh**2 + ymesh**2)/bound + np.cos(2*np.pi*xmesh / (bound/5)) noise = np.random.random(zmesh.shape)*(.1*bound) z = (zmesh+noise).flatten() xy = np.c_[xmesh.flatten(), ymesh.flatten()] z = (z - z.mean()) / z.std()
data looks like:
I have been going over the examples in https://github.com/pymc-devs/pymc-examples/tree/main/examples/gaussian_processes and this is what I came up with for modelling:
with pm.Model() as model: # primary effects ls = pm.Gamma('ls', alpha=2, beta=1, shape=2) eta= pm.HalfCauchy('eta', beta=5) cov_primary = eta**2 * pm.gp.cov.Matern52(2, ls=ls) # gp_primary = pm.gp.Marginal(cov_func=cov_primary) # periodic effects ls_periodic = pm.Gamma('ls_periodic', alpha=2, beta=1, shape=2) tau_periodic= pm.Gamma('tau_periodic', alpha=2, beta=1, shape=2) eta_periodic= pm.HalfCauchy('eta_periodic', beta=5) cov_periodic = eta_periodic**2 * pm.gp.cov.Periodic(2, period=tau_periodic, ls=ls_periodic) # gp_periodic = pm.gp.Marginal(cov_func=cov_periodic) # overall # gp = gp_primary + gp_periodic gp = pm.gp.Marginal(cov_func=cov_primary + cov_periodic) sigma = pm.HalfCauchy("sigma", beta=5) z_ = gp.marginal_likelihood('z', X=xy, y=z, sigma=sigma) with model: mp = pm.find_MAP(method='BFGS')
And then I have to use .find_map() because sampling takes way too long.
I look at how well it did by expanding the xy window and using gp.predict()
xcoords_new = np.linspace(-2*bound, 2*bound, 4*npoints) ycoords_new = np.linspace(-2*bound, 2*bound, 4*npoints) Xnew = pm.math.cartesian(xcoords_new[:, None], ycoords_new[:, None]) with model: mu, var = gp.predict(Xnew, point=mp, diag=True) cmap = 'rainbow' norm = mpl.colors.Normalize(vmin=z.min()) plt.imshow(mu.reshape(4*npoints, 4*npoints).T,norm=norm, cmap=cmap)
The resulting image doesn’t look too bad compared to the true data.
My question is:
How do I plot just the contribution from the periodic portion?
with model: a = np.random.multivariate_normal(pm.gp.mean.Zero()(xy).eval(), cov_periodic(xy).eval()) plt.imshow(a.reshape(npoints,npoints), cmap='rainbow')
But that does not look like a cosine wave dependent only on x. Is that the correct way to extract the periodic portion? I am unsure if the result is because of a poor model or if I am trying to extract the periodic portion incorrectly.
is this an appropriate application of gaussian processes?
I realize I may be using the wrong tool for the job. Also, my actual task will have a few orders of magnitude more datapoints and pymc took a while with just a 30x30 grid so I am worried I won’t be able to scale this up for my actual problem.
I will try to include plots of the data and output later. My work firewall is blocking uploads.