Serial assembly operations as a bottleneck for LL, RL, and MF with multithreaded BLAS
Determine whether the observed performance degradation of the left-looking (LL), right-looking (RL), and multifrontal (MF) serial supernodal sparse Cholesky factorization algorithms when using Intel’s MKL multithreaded BLAS is primarily caused by their assembly operations being executed serially, in contrast to the right-looking blocked (RLB) algorithm that performs all floating-point work within multithreaded BLAS kernels and avoids assembly.
References
We conjecture that the performance of~LL, RL, and~MF suffers seriously due to the fact that the assembly operations are performed serially.
— Some new techniques to use in serial sparse Cholesky factorization algorithms
(2409.13090 - Karsavuran et al., 19 Sep 2024) in Section 3.2 (Results)