I have a model that includes the calculation of an inverse matrix. The matrix is 8k * 8k. The estimation time of the model using NUTS is 432 hours for only 1000 iterations (plus 1000 tunes). I put the simplified model below (the true model only has more parameters, but the structure is the same). I will appreciate it if anyone has any idea to increase its speed.

```
with pm.Model() as model: # model specifications in PyMC3 are wrapped in a with-statement
# Parameters
b0_e = pm.Normal('b0', 0, sigma = 1)
b11_e = pm.Normal('b11', 0, sigma = 1)
b12_e = pm.Normal('b12', 0, sigma = 1)
b41_e = pm.Normal('b41', 1, sigma = 1)
b42_e = pm.Normal('b42', 1, sigma = 1)
sigma2 = pm.ChiSquared('e2', nu=2)
# Calculation of inverse matrix
# G1 is 8000 * 8000
bg_e = np.eye(N) - b2_e * G1
bg_e_t = np.eye(N) - b2_e * G1.transpose()
i_bg_e = pm.math.matrix_inverse(bg_e)
i_bg_e_t = pm.math.matrix_inverse(bg_e_t)
# linear combinations
bx_all = np.ones(N)*b0_e + b11_e * X[:,0] + b12_e * X[:,1] +b41_e * GX1[:,0] + b42_e * GX1[:,1]
# matrix multiplication
mu_y = pm.math.matrix_dot(i_bg_e, bx_all)
sigma_y = pm.math.extract_diag(pm.math.matrix_dot(i_bg_e, i_bg_e_t)) * sigma2
likelihood = pm.Normal('y', mu = mu_y , sigma = sigma_y, observed = Y)
trace = pm.sample(1000, return_inferencedata=True, cores = 2)
```