Distributed Optimization using Heterogeneous Compute Systems (2110.08941v1)

Published 3 Oct 2021 in cs.LG and cs.DC

Abstract: Hardware compute power has been growing at an unprecedented rate in recent years. The utilization of such advancements plays a key role in producing better results in less time -- both in academia and industry. However, merging the existing hardware with the latest hardware within the same ecosystem poses a challenging task. One of the key challenges, in this case, is varying compute power. In this paper, we consider the training of deep neural networks on a distributed system of workers with varying compute power. A naive implementation of synchronous distributed training will result in the faster workers waiting for the slowest worker to complete processing. To mitigate this issue, we propose to dynamically adjust the data assigned for each worker during the training. We assign each worker a partition of total data proportional to its computing power. Our experiments show that dynamically adjusting the data partition helps to improve the utilization of the system and significantly reduces the time taken for training. Code is available at the repository: \url{https://github.com/vineeths96/Heterogeneous-Systems}.

Summary

We haven't generated a summary for this paper yet.

Summarize Now

GitHub

GitHub - vineeths96/Heterogeneous-Systems: We present an algorithm to dynamically adjust the data assigned for each worker at every epoch during the training in a heterogeneous cluster. We empirically evaluate the performance of the dynamic partitioning by training deep neural networks on the CIFAR10 dataset. (6 stars)

Distributed Optimization using Heterogeneous Compute Systems (2110.08941v1)

Summary

Related Papers

GitHub