Papers
Topics
Authors
Recent
Search
2000 character limit reached

Random Alloy Codes and the Fundamental Limits of Coded Distributed Tensors

Published 7 Feb 2022 in cs.IT, cs.DC, cs.LG, cs.NA, cs.SC, math.IT, and math.NA | (2202.03469v7)

Abstract: Tensors are a fundamental operation in distributed computing, \emph{e.g.,} machine learning, that are commonly distributed into multiple parallel tasks for large datasets. Stragglers and other failures can severely impact the overall completion time. Recent works in coded computing provide a novel strategy to mitigate stragglers with coded tasks, with an objective of minimizing the number of tasks needed to recover the overall result, known as the recovery threshold. However, we demonstrate that this strict combinatorial definition does not directly optimize the probability of failure. In this paper, we focus on the most likely event and measure the optimality of a coding scheme more directly by its probability of decoding. Our probabilistic approach leads us to a practical construction of random codes for matrix multiplication, i.e., locally random alloy codes, which are optimal with respect to the measures. Furthermore, the probabilistic approach allows us to discover a surprising impossibility theorem about both random and deterministic coded distributed tensors.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimal coding,” ISIT, pp. 2022–2026, 2018.
  2. A. M. Subramaniam, A. Heidarzadeh, and K. R. Narayanan, “Random khatri-rao-product codes for numerically-stable distributed matrix multiplication,” Allerton, pp. 253–259, 2019.
  3. K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran, “Speeding Up Distributed Machine Learning Using Codes,” IEEE Trans. Inf. Theory, pp. 1514–1529, 2018.
  4. Q. Yu, M. A. Maddah-Ali, and A. S. Avestimehr, “Polynomial codes: an optimal design for high-dimensional coded matrix multiplication,” in NIPS, 2017.
  5. S. Dutta, M. Fahim, F. Haddadpour, H. Jeong, V. Cadambe, and P. Grover, “On the optimal recovery threshold of coded matrix multiplication,” IEEE Trans. Inf. Theory, pp. 278–301, 2020.
  6. S. Dutta, Z. Bai, H. Jeong, T. M. Low, and P. Grover, “A unified coded deep neural network training strategy based on generalized polydot codes,” ISIT, pp. 1585–1589, 2018.
  7. K. Censor-Hillel, Y. Machino, and P. Soto, “Near-optimal fault tolerance for efficient batch matrix multiplication via an additive combinatorics lens,” 2023.
  8. K. Lee, C. Suh, and K. Ramchandran, “High-dimensional coded matrix multiplication,” in ISIT, 2017, pp. 2418–2422.
  9. T. Baharav, K. Lee, O. Ocal, and K. Ramchandran, “Straggler-proofing massive-scale distributed matrix multiplication with d-dimensional product codes,” in ISIT, 2018, pp. 1993–1997.
  10. S. Wang, J. Liu, and N. Shroff, “Coded sparse matrix multiplication,” in ICML, 2018, pp. 5152–5160.
  11. P. Soto, J. Li, and X. Fan, “Dual entangled polynomial code: Three-dimensional coding for distributed matrix multiplication,” in ICML, 2019.
  12. S. Dutta, V. Cadambe, and P. Grover, ““short-dot”: Computing large linear transforms distributedly using coded short dot products,” IEEE Trans. Inf. Theory, pp. 6171–6193, 2019.
  13. A. B. Das and A. Ramamoorthy, “Distributed matrix-vector multiplication: A convolutional coding approach,” in ISIT, 2019, pp. 3022–3026.
  14. S. Hong, H. Yang, Y. Yoon, T. Cho, and J. Lee, “Chebyshev polynomial codes: Task entanglement-based coding for distributed matrix multiplication,” in ICML, 2021, pp. 4319–4327.
  15. Q. Yu, S. Li, N. Raviv, S. M. M. Kalan, M. Soltanolkotabi, and S. A. Avestimehr, “Lagrange coded computing: Optimal design for resiliency, security, and privacy,” in AISTATS, 2019, pp. 1215–1225.
  16. Z. Jia and S. A. Jafar, “Generalized cross subspace alignment codes for coded distributed batch matrix multiplication,” in ICC, 2020, pp. 1–6.
  17. P. Soto and J. Li, “Straggler-free coding for concurrent matrix multiplications,” in ISIT, 2020, pp. 233–238.
  18. P. Soto, X. Fan, A. Saldivia, and J. Li, “Rook coding for batch matrix multiplication,” IEEE Trans. Commun., pp. 1–1, 2022.
  19. R. G. L. D’Oliveira, S. E. Rouayheb, and D. A. Karpuk, “Gasp codes for secure distributed matrix multiplication,” ISIT, pp. 1107–1111, 2019.
  20. L. Tang, K. Konstantinidis, and A. Ramamoorthy, “Erasure coding for distributed matrix multiplication for matrices with bounded entries,” IEEE Commun. Lett., vol. 23, pp. 8–11, 2019.
  21. R. Tandon, Q. Lei, A. G. Dimakis, and N. Karampatziakis, “Gradient coding: Avoiding stragglers in distributed learning,” in ICML, 2017, pp. 3368–3376.
  22. P. J. Soto, I. Ilmer, H. Guan, and J. Li, “Lightweight projective derivative codes for compressed asynchronous gradient descent,” in ICML, 2022, pp. 20 444–20 458.
  23. I. Csiszár and J. Körner, “Information theory - coding theorems for discrete memoryless systems, second edition,” 1997.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (1)

Collections

Sign up for free to add this paper to one or more collections.