Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Communication-Optimal Parallel Algorithm for Strassen's Matrix Multiplication (1202.3173v1)

Published 14 Feb 2012 in cs.DS, cs.CC, cs.DC, cs.NA, math.CO, and math.NA

Abstract: Parallel matrix multiplication is one of the most studied fundamental problems in distributed and high performance computing. We obtain a new parallel algorithm that is based on Strassen's fast matrix multiplication and minimizes communication. The algorithm outperforms all known parallel matrix multiplication algorithms, classical and Strassen-based, both asymptotically and in practice. A critical bottleneck in parallelizing Strassen's algorithm is the communication between the processors. Ballard, Demmel, Holtz, and Schwartz (SPAA'11) prove lower bounds on these communication costs, using expansion properties of the underlying computation graph. Our algorithm matches these lower bounds, and so is communication-optimal. It exhibits perfect strong scaling within the maximum possible range. Benchmarking our implementation on a Cray XT4, we obtain speedups over classical and Strassen-based algorithms ranging from 24% to 184% for a fixed matrix dimension n=94080, where the number of nodes ranges from 49 to 7203. Our parallelization approach generalizes to other fast matrix multiplication algorithms.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Grey Ballard (36 papers)
  2. James Demmel (54 papers)
  3. Olga Holtz (16 papers)
  4. Benjamin Lipshitz (7 papers)
  5. Oded Schwartz (14 papers)
Citations (136)

Summary

We haven't generated a summary for this paper yet.