I am trying to estimate the posterior for hyperparameters of a covariance function. Here is a Kruschke diagram of my model.
I am looking for speedups preferably using NUTS as it seems to sample the hyperparameters better. Although, Metropolis is ~100x faster for a given sample (although, not for a given effective sample!). I am using a non centered parameterization. But am using the generic inference button. Any hints, tips or tricks to speed this up? Here is my model code:
with pm.Model() as GP_uncentered:
# uncentered version:
#to sample from MvN(0,K) multiply L z where LL^T = K
# and z = N(0,1) the scale parameter of in this case the normal dist
# ============== kernel =============== sigma_sq = 1 # pm.Gamma('sigma_sq', 5,10) # sets maximum covariance ij lsq = 1e4 #pm.Gamma('lsq', alpha = 5, beta=7e-5) sigma_n = .01 Kij = simga_sq*(np.exp(-Dmatsq/lsq)+np.diag([sigma_n]*N)) # ========== kernel informing prior slope covariance ======== z = pm.Normal('z', 0., 1., shape=Num_farms) # scale parameter b = pm.Deterministic('w', tt.dot(tt.slinalg.cholesky(Kij), z)) # prior slope a = pm.Normal('a', 0, .3, shape=Num_farms) # prior intercep # ========== Linear Model =============== u = pm.Deterministic('mu',a[zero_farm_idx] + b[idx]*X #============ Likelihood ============== sigma = pm.Gamma('sigma',5,10) y = pm.Normal('y', mu=u, sd=sigma, observed=Y) # ============ sample ================= #trace = pm.sample(500,tune=500,chains=2)
Matrix inversions scale O(n3). So this is likely a significant portion of the time spent sampling. However, the MH algorithm is ~100x faster. Is this evidence the gradient calculations necessary for HMC are slowing down NUTS by about a 100x factor? Any advice?
-I am not getting divergent transition warnings.
-N_eff is fairly high for hyperparms with NUTS (about half-a third of samples).