Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Workload-aware Automatic Parallelization for Multi-GPU DNN Training (1811.01532v2)

Published 5 Nov 2018 in cs.DC

Abstract: Deep neural networks (DNNs) have emerged as successful solutions for variety of artificial intelligence applications, but their very large and deep models impose high computational requirements during training. Multi-GPU parallelization is a popular option to accelerate demanding computations in DNN training, but most state-of-the-art multi-GPU deep learning frameworks not only require users to have an in-depth understanding of the implementation of the frameworks themselves, but also apply parallelization in a straight-forward way without optimizing GPU utilization. In this work, we propose a workload-aware auto-parallelization framework (WAP) for DNN training, where the work is automatically distributed to multiple GPUs based on the workload characteristics. We evaluate WAP using TensorFlow with popular DNN benchmarks (AlexNet and VGG-16), and show competitive training throughput compared with the state-of-the-art frameworks, and also demonstrate that WAP automatically optimizes GPU assignment based on the workload's compute requirements, thereby improving energy efficiency.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Sungho Shin (52 papers)
  2. Youngmin Jo (2 papers)
  3. Jungwook Choi (28 papers)
  4. Swagath Venkataramani (14 papers)
  5. Vijayalakshmi Srinivasan (4 papers)
  6. Wonyong Sung (33 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.