Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

On Scale-out Deep Learning Training for Cloud and HPC (1801.08030v1)

Published 24 Jan 2018 in cs.DC and cs.LG

Abstract: The exponential growth in use of large deep neural networks has accelerated the need for training these deep neural networks in hours or even minutes. This can only be achieved through scalable and efficient distributed training, since a single node/card cannot satisfy the compute, memory, and I/O requirements of today's state-of-the-art deep neural networks. However, scaling synchronous Stochastic Gradient Descent (SGD) is still a challenging problem and requires continued research/development. This entails innovations spanning algorithms, frameworks, communication libraries, and system design. In this paper, we describe the philosophy, design, and implementation of Intel Machine Learning Scalability Library (MLSL) and present proof-points demonstrating scaling DL training on 100s to 1000s of nodes across Cloud and HPC systems.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Srinivas Sridharan (24 papers)
  2. Karthikeyan Vaidyanathan (2 papers)
  3. Dhiraj Kalamkar (15 papers)
  4. Dipankar Das (86 papers)
  5. Mikhail E. Smorkalov (6 papers)
  6. Mikhail Shiryaev (3 papers)
  7. Dheevatsa Mudigere (35 papers)
  8. Naveen Mellempudi (11 papers)
  9. Sasikanth Avancha (20 papers)
  10. Bharat Kaul (23 papers)
  11. Pradeep Dubey (31 papers)
Citations (30)

Summary

We haven't generated a summary for this paper yet.