Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations (2108.09337v2)

Published 20 Aug 2021 in cs.DC, cs.CC, and cs.PF

Abstract: Matrix factorizations are among the most important building blocks of scientific computing. State-of-the-art libraries, however, are not communication-optimal, underutilizing current parallel architectures. We present novel algorithms for Cholesky and LU factorizations that utilize an asymptotically communication-optimal 2.5D decomposition. We first establish a theoretical framework for deriving parallel I/O lower bounds for linear algebra kernels, and then utilize its insights to derive Cholesky and LU schedules, both communicating N3/(P*sqrt(M)) elements per processor, where M is the local memory size. The empirical results match our theoretical analysis: our implementations communicate significantly less than Intel MKL, SLATE, and the asymptotically communication-optimal CANDMC and CAPITAL libraries. Our code outperforms these state-of-the-art libraries in almost all tested scenarios, with matrix sizes ranging from 2,048 to 262,144 on up to 512 CPU nodes of the Piz Daint supercomputer, decreasing the time-to-solution by up to three times. Our code is ScaLAPACK-compatible and available as an open-source library.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Marko Kabić (5 papers)
  2. Tal Ben-Nun (53 papers)
  3. Alexandros Nikolaos Ziogas (16 papers)
  4. Jens Eirik Saethre (1 paper)
  5. André Gaillard (1 paper)
  6. Timo Schneider (18 papers)
  7. Maciej Besta (66 papers)
  8. Anton Kozhevnikov (9 papers)
  9. Joost VandeVondele (10 papers)
  10. Torsten Hoefler (203 papers)
  11. Grzegorz Kwasniewski (15 papers)
Citations (15)

Summary

We haven't generated a summary for this paper yet.