I am trying to estimate the posterior for hyperparameters of a covariance function. Here is a Kruschke diagram of my model.

I am looking for speedups preferably using NUTS as it seems to sample the hyperparameters better. Although, Metropolis is ~100x faster for a given sample (although, not for a given effective sample!). I am using a non centered parameterization. But am using the generic inference button. Any hints, tips or tricks to speed this up? Here is my model code:

Blockquote

with pm.Model() as GP_uncentered:

# uncentered version:

#to sample from MvN(0,K) multiply L z where LL^T = K

# and z = N(0,1) the scale parameter of in this case the normal dist

```
# ============== kernel ===============
sigma_sq = 1 # pm.Gamma('sigma_sq', 5,10) # sets maximum covariance ij
lsq = 1e4 #pm.Gamma('lsq', alpha = 5, beta=7e-5)
sigma_n = .01
Kij = simga_sq*(np.exp(-Dmatsq/lsq)+np.diag([sigma_n]*N))
# ========== kernel informing prior slope covariance ========
z = pm.Normal('z', 0., 1., shape=Num_farms) # scale parameter
b = pm.Deterministic('w', tt.dot(tt.slinalg.cholesky(Kij), z)) # prior slope
a = pm.Normal('a', 0, .3, shape=Num_farms) # prior intercep
# ========== Linear Model ===============
u = pm.Deterministic('mu',a[zero_farm_idx] + b[idx]*X
#============ Likelihood ==============
sigma = pm.Gamma('sigma',5,10)
y = pm.Normal('y', mu=u, sd=sigma, observed=Y)
# ============ sample =================
#trace = pm.sample(500,tune=500,chains=2)
```

Matrix inversions scale O(n3). So this is likely a significant portion of the time spent sampling. However, the MH algorithm is ~100x faster. Is this evidence the gradient calculations necessary for HMC are slowing down NUTS by about a 100x factor? Any advice?

-I am not getting divergent transition warnings.

-N_eff is fairly high for hyperparms with NUTS (about half-a third of samples).