KroneckerNormal speed

Your code looks good. It looks like the worst offenders are the Scan and Eigh operations. Scan is sort of how Theano does loops, and it’s known to be pretty slow. The pymc3 source where it’s being called by KroneckerNormal is here, by the kron_matrix_op function. It is used to compute things like the dot product (also solves) between n Kronecker’ed matrices and some vector y,

(K_1 \otimes K_2 \otimes \cdots \otimes K_n) y

Scan is used here because Theano doesn’t know n. This problem doesn’t occur for MatrixNormal because it’s for n=2. It would be very good if we could get rid of the use of Scan. That would speed up your code by about ~50%.

It appears that the Eigh decomposition is slower in general than the Cholesky decomposition. When running on the cpu, Theano calls scipy.linalg.eigh and scipy.linalg.cholesky under the hood, see here and here. For timings on my machine (4 core i5 processor using mkl), I get:

# make symmetric 1000x1000 matrix
A = np.random.randn(1000,1000)
K = np.dot(A.T, A) + np.eye(1000)*1e-6

%timeit sp.linalg.eigh(K)
167 ms ± 1.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit sp.linalg.cholesky(K)
14.7 ms ± 54.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

On your machine, these timings ought to match up pretty well with the time per call’s for these ops in your profiling. So Eigh is just slower than Cholesky.

Chapter 5 of Saatchi’s thesis is a good place to find detailed info about using the Eigendecomposition for Kronecker structed covariance with spherical noise.

1 Like