Impact of MF’s stack data movement on performance relative to LL and RL

Determine whether the data movement costs associated with managing the multifrontal (MF) method’s stack of update matrices—specifically the packing and related operations described in line 18 of Algorithm 1—are a primary cause of MF’s slower performance relative to the left-looking (LL) and right-looking (RL) serial supernodal sparse Cholesky factorization algorithms when using multithreaded BLAS.

Background

Under multithreaded BLAS, MF lagged behind LL and RL by a substantial margin in the reported performance profiles. Unlike LL and RL, MF maintains a stack of packed update matrices, involving packing/unpacking and movement of data as updates are propagated up the supernodal elimination tree.

The authors attribute MF’s relative slowdown to increased data movement arising from managing this stack structure (notably the packing step when pushing updates), and they pose this as a conjectured explanation for the observed timing gap.

References

We conjecture that the costs of the data movement associated with~MF's stack (see line~18 in Algorithm~\ref{alg:MF}) are hurting the performance of~MF relative to~LL and~RL.

— Some new techniques to use in serial sparse Cholesky factorization algorithms (2409.13090 - Karsavuran et al., 2024) in Section 3.2 (Results)

Impact of MF’s stack data movement on performance relative to LL and RL

Background

References

Related Problems