Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hierarchical Coded Matrix Multiplication (1907.08818v1)

Published 20 Jul 2019 in cs.IT, cs.DC, and math.IT

Abstract: Slow working nodes, known as stragglers, can greatly reduce the speed of distributed computation. Coded matrix multiplication is a recently introduced technique that enables straggler-resistant distributed multiplication of large matrices. A key property is that the finishing time depends only on the work completed by a set of the fastest workers, while the work done by the slowest workers is ignored completely. This paper is motivated by the observation that in real-world commercial cloud computing systems such as Amazon's Elastic Compute Cloud (EC2) the distinction between fast and slow nodes is often a soft one. Thus, if we could also exploit the work completed by stragglers we may realize substantial performance gains. To realize such gains, in this paper we use the idea of hierarchical coding (Ferdinand and Draper, IEEE Int. Symp. Inf. Theory, 2018). We decompose the overall matrix multiplication task into a hierarchy of heterogeneously sized subtasks. The duty to complete each subtask is shared amongst all workers and each subtask is (generally) of a different complexity. The motivation for the hierarchical decomposition is the recognition that more workers will finish the first subtask than the second (or third, forth, etc.). Connecting to error correction coding, earlier subtasks can therefore be designed to be of a higher rate than later subtasks. Through this hierarchical design our scheme exploits the work completed by stragglers, rather than ignoring it, even if that amount is much less than that completed by the fastest workers. We numerically show that our method realizes a 60% improvement in the expected finishing time for a widely studied statistical model of the speed of computation and, on Amazon EC2, the gain is 35%.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Shahrzad Kiani (10 papers)
  2. Nuwan Ferdinand (8 papers)
  3. Stark C. Draper (47 papers)
Citations (35)

Summary

We haven't generated a summary for this paper yet.