Papers
Topics
Authors
Recent
Search
2000 character limit reached

A Unified Treatment of Partial Stragglers and Sparse Matrices in Coded Matrix Computation

Published 24 Sep 2021 in cs.IT and math.IT | (2109.12070v3)

Abstract: The overall execution time of distributed matrix computations is often dominated by slow worker nodes (stragglers) within the clusters. Recently, different coding techniques have been utilized to mitigate the effect of stragglers where worker nodes are assigned the job of processing encoded submatrices of the original matrices. In many machine learning or optimization problems the relevant matrices are often sparse. Several prior coded computation methods operate with dense linear combinations of the original submatrices; this can significantly increase the worker node computation times and consequently the overall job execution time. Moreover, several existing techniques treat the stragglers as failures (erasures) and discard their computations. In this work, we present a coding approach which operates with limited encoding of the original submatrices and utilizes the partial computations done by the slower workers. While our scheme can continue to have the optimal threshold of prior work, it also allows us to trade off the straggler resilience with the worker computation speed for sparse input matrices. Extensive numerical experiments done in AWS (Amazon Web Services) cluster confirm that the proposed approach enhances the speed of the worker computations (and thus the whole process) significantly.

Citations (9)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.