Distributed Deep Learning in Open Collaborations (2106.10207v2)

Published 18 Jun 2021 in cs.LG and cs.DC

Abstract: Modern deep learning applications require increasingly more compute to train state-of-the-art models. To address this demand, large corporations and institutions use dedicated High-Performance Computing clusters, whose construction and maintenance are both environmentally costly and well beyond the budget of most organizations. As a result, some research directions become the exclusive domain of a few large industrial and even fewer academic actors. To alleviate this disparity, smaller groups may pool their computational resources and run collaborative experiments that benefit all participants. This paradigm, known as grid- or volunteer computing, has seen successful applications in numerous scientific areas. However, using this approach for machine learning is difficult due to high latency, asymmetric bandwidth, and several challenges unique to volunteer computing. In this work, we carefully analyze these constraints and propose a novel algorithmic framework designed specifically for collaborative training. We demonstrate the effectiveness of our approach for SwAV and ALBERT pretraining in realistic conditions and achieve performance comparable to traditional setups at a fraction of the cost. Finally, we provide a detailed report of successful collaborative LLM pretraining with 40 participants.

Authors (16)

Michael Diskin (6 papers)
Alexey Bukhtiyarov (2 papers)
Max Ryabinin (29 papers)
Lucile Saulnier (10 papers)
Quentin Lhoest (9 papers)
Anton Sinitsin (4 papers)
Dmitry Popov (7 papers)
Dmitry Pyrkin (1 paper)
Maxim Kashirin (1 paper)
Alexander Borzunov (7 papers)
Albert Villanova del Moral (6 papers)
Denis Mazur (5 papers)
Ilia Kobelev (1 paper)
Yacine Jernite (47 papers)
Thomas Wolf (117 papers)
Gennady Pekhimenko (52 papers)

Citations (47)

View on Semantic Scholar

Summary

An Analytical Overview of "Distributed Deep Learning in Open Collaborations"

Introduction

The paper "Distributed Deep Learning in Open Collaborations" offers a methodological advancement in the field of deep learning by proposing a framework for training state-of-the-art models using collaborative and distributed resources. The research shines a light on the increasing computational demands of modern deep learning models and the resultant resource centralization in industrial and academic giants, necessitating a paradigm shift towards democratized access through collaborative efforts.

Key Contributions and Methodology

This research introduces a novel framework, Distributed Deep Learning in Open Collaborations (DeDLOC), to tackle the unique challenges of distributed machine learning across heterogeneous devices in volunteer computing environments. The framework specifically addresses high latency, asymmetrical bandwidth, and the intermittent availability of computational resources characterizing volunteer computing setups.

Significantly, the authors develop an adaptive averaging algorithm that balances between different data-parallel strategy variations to maximize training throughput across diverse hardware. The algorithm dynamically assigns roles based on device capabilities, shifting between strategies reminiscent of parameter servers, All-Reduce, and decentralized gradient descent as conditions dictate.

Additionally, DeDLOC ensures training consistency via synchronous updates and large batch processing, making it equivalent to large-batch training on high-performance clusters when conditions align. The paper also outlines a comprehensive system design for peer-to-peer coordination, addressing challenges such as NAT traversal and dataset streaming.

Experimental Insights

The validity and applicability of DeDLOC are demonstrated through exhaustive experiments. The framework successfully facilitates unsupervised pretraining of models like ALBERT-Large and SwAV in volunteer settings, achieving comparable performance levels to traditional clusters at a reduced computational cost.

Empirically, DeDLOC scales efficiently, as observed in scenarios merging various GPU and CPU configurations, showcasing the adaptability and robustness of its communication strategy. A real-world training exercise involving 40 volunteers for a Bengali LLM solidifies its practical capabilities, achieving state-of-the-art results in Bengali language representation despite the instability and heterogeneity of volunteer hardware.

Implications and Future Research

DeDLOC's contributions extend beyond technical merits, holding significant implications for expanding access to deep learning model training to smaller research entities and independent researchers. This democratization could accelerate innovation and applications in fields traditionally hindered by computational barriers, thus fostering inclusivity in scientific progress.

Future development could focus on refining NAT traversal methods and enhancing fault tolerance through improved group-based averaging strategies. Additionally, more exploration into hybrid computational setups involving GPUs and TPUs could augment DeDLOC's versatility and effectiveness.

Conclusion

This paper positions DeDLOC as a compelling contribution to distributed deep learning, offering a robust framework that leverages volunteer computing to democratize the training of high-performance models. By addressing the multifaceted challenges in distributed training settings, it creates opportunities for more inclusive and environmentally considerate computational practices in artificial intelligence.

PDF Markdown

Related Papers

Find Related Papers

Tweets

https://twitter.com/samsja19/status/1884699925222744248

YouTube

Show All Videos