Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Decentralized Uncoded Storage Elastic Computing with Heterogeneous Computation Speeds (2403.00585v1)

Published 1 Mar 2024 in cs.IT and math.IT

Abstract: Elasticity plays an important role in modern cloud computing systems. Elastic computing allows virtual machines (i.e., computing nodes) to be preempted when high-priority jobs arise, and also allows new virtual machines to participate in the computation. In 2018, Yang et al. introduced Coded Storage Elastic Computing (CSEC) to address the elasticity using coding technology, with lower storage and computation load requirements. However, CSEC is limited to certain types of computations (e.g., linear) due to the coded data storage based on linear coding. Then Centralized Uncoded Storage Elastic Computing (CUSEC) with heterogeneous computation speeds was proposed, which directly copies parts of data into the virtual machines. In all existing works in elastic computing, the storage assignment is centralized, meaning that the number and identity of all virtual machines possible used in the whole computation process are known during the storage assignment. In this paper, we consider Decentralized Uncoded Storage Elastic Computing (DUSEC) with heterogeneous computation speeds, where any available virtual machine can join the computation which is not predicted and thus coordination among different virtual machines' storage assignments is not allowed. Under a decentralized storage assignment originally proposed in coded caching by Maddah-Ali and Niesen, we propose a computing scheme with closed-form optimal computation time. We also run experiments over MNIST dataset with Softmax regression model through the Tencent cloud platform, and the experiment results demonstrate that the proposed DUSEC system approaches the state-of-art best storage assignment in the CUSEC system in computation time.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (20)
  1. X. Meng, J. Bradley, B. Yavuz, E. Sparks, S. Venkataraman, D. Liu, J. Freeman, D. Tsai, M. Amde, S. Owen et al., “Mllib: Machine learning in apache spark,” The journal of machine learning research, vol. 17, no. 1, pp. 1235–1241, 2016.
  2. M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.
  3. Y. Yang, M. Interlandi, P. Grover, S. Kar, S. Amizadeh, and M. Weimer, “Coded elastic computing,” in 2019 IEEE International Symposium on Information Theory (ISIT), July 2019, pp. 2654–2658.
  4. S. Kiani, T. Adikari, and S. C. Draper, “Hierarchical coded elastic computing,” in ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp. 4045–4049.
  5. N. Woolsey, R.-R. Chen, and M. Ji, “Coded elastic computing on machines with heterogeneous storage and computation speed,” IEEE Transactions on Communications, vol. 69, no. 5, pp. 2894–2908, 2021.
  6. N. Woolsey, J. Kliewer, R.-R. Chen, and M. Ji, “A practical algorithm design and evaluation for heterogeneous elastic computing with stragglers,” arXiv preprint arXiv:, 2021.
  7. X. Zhong, J. Kliewer, and M. Ji, “Matrix multiplication with straggler tolerance in coded elastic computing via lagrange code,” in in 2023 IEEE International Conference on Communications (ICC), 2023, pp. 136–141.
  8. H. Dau, R. Gabrys, Y. C. Huang, C. Feng, Q. H. Luu, E. Alzahrani, and Z. Tari, “Optimizing the transition waste in coded elastic computing,” in 2020 IEEE International Symposium on Information Theory (ISIT).   IEEE, 2020, pp. 174–178.
  9. M. Ji, X. Zhang, and K. Wan, “A new design framework for heterogeneous uncoded storage elastic computing,” in 2022 20th International Symposium on Modeling and Optimization in Mobile, Ad hoc, and Wireless Networks (WiOpt), 2022, pp. 269–275.
  10. X. Zhong, J. Kliewer, and M. Ji, “Uncoded storage coded transmission elastic computing with straggler tolerance in heterogeneous systems,” arXiv:2401.12151, Jan. 2024.
  11. D. Mosk-Aoyama and D. Shah, “Fast distributed algorithms for computing separable functions,” IEEE Transactions on Information Theory, vol. 54, no. 7, pp. 2997–3007, 2008.
  12. K. Wan, H. Sun, M. Ji, and G. Caire, “Distributed linearly separable computation,” IEEE Transactions on Information Theory, vol. 68, no. 2, pp. 1259–1278, 2021.
  13. M. A. Maddah-Ali and U. Niesen, “Decentralized coded caching attains order-optimal memory-rate tradeoff,” Networking, IEEE/ACM Transactions on, vol. 23, no. 4, pp. 1029–1040, Aug 2015.
  14. T. Jahani-Nezhad and M. A. Maddah-Ali, “Optimal communication-computation trade-off in heterogeneous gradient coding,” IEEE Journal on Selected Areas in Information Theory, vol. 2, no. 3, pp. 1002–1011, 2021.
  15. M. A. Maddah-Ali and U. Niesen, “Decentralized coded caching attains order-optimal memory-rate tradeoff,” IEEE/ACM Transactions on Networking, vol. 23, no. 4, pp. 1029–1040, 2015.
  16. ——, “Fundamental limits of caching,” IEEE Transactions on Information Theory, vol. 60, no. 5, pp. 2856–2867, 2014.
  17. R. Tandon, Q. Lei, A. G. Dimakis, and N. Karampatziakis, “Gradient coding: Avoiding stragglers in distributed learning,” in International Conference on Machine Learning.   PMLR, 2017, pp. 3368–3376.
  18. M. Ye and E. Abbe, “Communication-computation efficient gradient coding,” in International Conference on Machine Learning.   PMLR, 2018, pp. 5610–5619.
  19. H. Cao, Q. Yan, and X. Tang, “Adaptive gradient coding,” IEEE/ACM Trans. Networking, vol. 30, no. 2, pp. 717–734, Apr. 2022.
  20. N. Woolsey, R.-R. Chen, and M. Ji, “Uncoded placement with linear sub-messages for private information retrieval from storage constrained databases,” IEEE Transactions on Communications, vol. 68, no. 10, pp. 6039–6053, 2020.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com