I have put together a spatial multi-output gaussian process and performed prior predictive simulations on it.
I am posting below for review (I have a few questions) and as a helper for other’s seeking to perform prior predictive checks for this kind of model (I couldn’t find a similar post when I started).
My primary concern is that something may be failing silently or that some functionality isn’t implemented correctly.
My code is based off these excellent posts:
- 2022-03-14_spatial/2022-03-14_pymc3.ipynb · master · Essi Parent / blog_sm · GitLab
- Multi-output Gaussian Processes: Coregionalization models using Hamadard product — PyMC example gallery
- Kronecker Structured Covariances — PyMC example gallery
My data comprises multivariate recordings on archaeological artefacts. I am seeking to model the spatial relationship as a multi-output gaussian process with intrinsic coregionalization. In my final model I will have 8 output parameters and a location for each observation as input.
My current model, with simulated input data is:
# Data structure of sample
n1, n2 = (50, 50)
x1 = np.linspace(0, 50, n1)
x2 = np.linspace(0, 50, n2)
param_idx = [0,1,2,3] # 4 parameters
result_id = np.array(param_idx*n1*n2)
Xs = pm.math.cartesian(x1[:, None], x2[:, None])
Xs = np.vstack((np.arange(0,Xs.shape[0]), Xs[:,0],Xs[:,1]))
Xs = np.vstack((np.repeat(Xs,len(param_idx),axis=1),result_id))# Total x input,input-idx x-coord,y-coord, result_id
Xs = Xs.T
Xdf =pd.DataFrame(Xs,columns=['sample_id','x coord','y coord','parameter id'])
Xtest = Xdf[['x coord','y coord','parameter id']].values
n_outputs = len(param_idx)
with pm.Model() as spatial_model:
# GP
## Feature covariance
length_scale = pm.InverseGamma("length_scale",alpha=4.5,beta=18)
amplitude = pm.HalfCauchy("amplitude", beta = 10)
cov_feature = amplitude ** 2 * pm.gp.cov.Matern52(
input_dim = 3,
ls = length_scale,
active_dims = [0,1] # all except index
)
## Coregion covariance
W = pm.Normal(
"W", mu = 0, sigma = 5,
shape = (n_outputs, n_outputs) # 4 outputs in final model
)
kappa = pm.HalfCauchy("kappa", beta = 10, shape = n_outputs)
coreg = pm.gp.cov.Coregion(
input_dim = Xtest.shape[1],
active_dims = [2], # only index
kappa = kappa,
W = W
)
## Combined covariance
cov_f = coreg * cov_feature
## Gaussian process
#gp = pm.gp.Marginal(cov_func = cov_f)
gp = pm.gp.Latent(cov_func = cov_f)
##Prior Predictive simulation
gp_prior = gp.prior('gpprior',Xtest)
f_star = gp.conditional('fstar',Xtest)
prior_sample = pm.sample_prior_predictive(draws=1)
The following is then taken from the prior_sample to visualise the results on a “map”
ig, axs = plt.subplots(2, 2)
prior_pred = prior_sampletest.prior.fstar.values[0,1,:]
pf = 0 #Parameter spatial function
for i in range(2):
for j in range(2):
axs[i,j].scatter(Xtest[:,0][Xtest[:,2]==pf], #X_coords for function pf
Xtest[:,1][Xtest[:,2]==pf], #Y_coords for function pf
s=35,
c=prior_pred[Xtest[:,2]==pf],)cmap=cmap)
axs[i,j].set_xlabel('X_coord')
axs[i,j].set_ylabel('Y_coord')
axs[i,j].set_title(f"Parameter Function: {pf}")
axs[i,j].set_aspect('equal')
pf += 1
ig.suptitle('Prior Predictive Simulations')
Which produces:
Which to my eyes look sufficiently random and spatially consistent. I would appreciate if anyone could have a look through the code to confirm the active dimensions and interactions are functioning correctly. That is, there is a unique gaussian process for each parameter/output, whose covariances are modified by a coregionalization kernel for the inter-parameter covariances.
a couple questions:
- In my understanding, the W matrix, with kappa, is the covariance between parameters/outputs. The W matrix can apparently have a lower rank imposed on it, is there a principled way to do this? Or is it generally a computation saver. The baseball MOGP example imposes a (5_output,2) structure for the W matrix.
- Sampling from this particular prior is quite slow, other than using the HSGP, is there an alternative to speed up sampling?
- Prior sampling can produce a warning that the covariance is not symmetric positive semi-definite. Is this a concern?
Thanks in advance.