Advice for speeding up sampling for models with multiple evaluations of Cholesky factors

I am trying to estimate the posterior for hyperparameters of a covariance function. Here is a Kruschke diagram of my model.

I am looking for speedups preferably using NUTS as it seems to sample the hyperparameters better. Although, Metropolis is ~100x faster for a given sample (although, not for a given effective sample!). I am using a non centered parameterization. But am using the generic inference button. Any hints, tips or tricks to speed this up? Here is my model code:

with pm.Model() as GP_uncentered:
# uncentered version:
#to sample from MvN(0,K) multiply L z where LL^T = K
# and z = N(0,1) the scale parameter of in this case the normal dist

# ============== kernel ===============
sigma_sq = 1 # pm.Gamma('sigma_sq', 5,10) # sets maximum covariance ij
lsq = 1e4 #pm.Gamma('lsq', alpha = 5, beta=7e-5)
sigma_n = .01
Kij = simga_sq*(np.exp(-Dmatsq/lsq)+np.diag([sigma_n]*N))

# ========== kernel informing prior slope covariance ========
z = pm.Normal('z', 0., 1., shape=Num_farms) # scale parameter
b = pm.Deterministic('w',, z)) # prior slope

a = pm.Normal('a', 0, .3, shape=Num_farms) # prior intercep

# ========== Linear Model =============== 
u = pm.Deterministic('mu',a[zero_farm_idx]  + b[idx]*X 

#============ Likelihood ==============
sigma = pm.Gamma('sigma',5,10)

y = pm.Normal('y', mu=u, sd=sigma, observed=Y) 

# ============ sample =================
#trace = pm.sample(500,tune=500,chains=2)

Matrix inversions scale O(n3). So this is likely a significant portion of the time spent sampling. However, the MH algorithm is ~100x faster. Is this evidence the gradient calculations necessary for HMC are slowing down NUTS by about a 100x factor? Any advice?

-I am not getting divergent transition warnings.
-N_eff is fairly high for hyperparms with NUTS (about half-a third of samples).

Maybe try the pm.GP with approximation? From a quick look these is nothing obvious to speed up :sweat_smile: