DrJAX: Scalable and Differentiable MapReduce Primitives in JAX (2403.07128v2)

Published 11 Mar 2024 in cs.DC and cs.LG

Abstract: We present DrJAX, a JAX-based library designed to support large-scale distributed and parallel machine learning algorithms that use MapReduce-style operations. DrJAX leverages JAX's sharding mechanisms to enable native targeting of TPUs and state-of-the-art JAX runtimes, including Pathways. DrJAX embeds building blocks for MapReduce computations as primitives in JAX. This enables three key benefits. First, DrJAX computations can be translated directly to XLA HLO, enabling flexible integration with a wide array of ML training platforms. Second, DrJAX computations are fully differentiable. Last, DrJAX computations can be interpreted out to existing batch-processing compute systems, including traditional MapReduce systems like Apache Beam and cross-device compute systems like those powering federated learning applications. We show that DrJAX provides an easily programmable, performant, and scalable framework for parallelized algorithm development. DrJAX is available at \url{https://github.com/google-research/google-research/tree/master/drjax}.

References (35)

Citations (1)

View on Semantic Scholar

Summary

The paper introduces FAX, a novel library that integrates federated learning primitives into JAX to enable scalable and efficient distributed ML.
It leverages JAX’s JIT compilation and sharding techniques to achieve near-constant weak scaling for training models up to 8 billion parameters.
FAX preserves data location details through its design, bridging research prototypes and production systems for practical federated learning deployment.

FAX: Integrating Federated Learning Primitives into JAX for Scalable Distributed ML

Introduction

The development and success of ML applications, particularly those involving large-scale computations, are significantly driven by the ability to distribute computations across multiple compute nodes. This paper introduces FAX, a software library that embeds federated learning (FL) computations into JAX, leveraging JAX's built-in primitives to provide scalable and efficient federated computations. Federated Learning is an ML paradigm where multiple clients collaborate to train a model without sharing raw data. The ability to perform such computations in data centers is crucial for accelerating FL research and enabling the application of FL algorithms in various settings. FAX has been designed with performance, scalability, and ease of programming in mind, ensuring computations can be effortlessly translated to XLA HLO and interpreted by production cross-device federated compute systems.

System Design

FAX reimagines federated computations by integrating them as primitives within JAX. This integration hinges on two observations: most FL computations resemble distributed ML workloads, and federated automatic differentiation (AD) can be achieved by tracking data placement. By treating data locations as first-class citizens, FAX effectively manages federated values, distinguishing between values placed on clients and those placed on servers. The library provides a suite of federated building blocks, such as federated broadcast, federated map, and federated sum, which form the backbone of FL algorithms. Notably, these building blocks are designed to preserve information about data locations, enabling the differentiation through federated computations while maintaining the integrity of data placements.

Implementation

FAX’s implementation focuses on representing federated values as JAX arrays with an added dimension to indicate placement, facilitating the use of JAX's primitives mechanism for federated computations. By encapsulating federated computations within JAX, FAX leverages its JIT compilation and AD capabilities, thereby improving data center performance and scalability. FAX also employs specific sharding annotations to guide compilers like GSPMD in optimally distributing computations across devices. This approach ensures that FAX not only supports efficient and scalable federated training of large models but also extends to a broader range of ML computations beyond FL.

Scalability and Efficiency

The paper presents empirical results demonstrating FAX's ability to enable efficient and scalable federated training of LLMs ranging from 350 million to 8 billion parameters. By effectively sharding computations and utilizing JAX's JIT capabilities, FAX achieves near-constant weak scaling performance, a key indicator of its ability to manage large-scale federated computations. Additionally, the data showcases FAX's superiority in JIT compilation optimization over naïve for-loop implementations and highlights the necessity of FAX’s internal sharding annotations for achieving optimal performance.

Integration with Production Systems

A significant advantage of FAX is its ability to preserve data location information in computations, facilitating the translation of federated computations into representations comprehensible by production federated learning systems. By leveraging JAX's primitives mechanism, FAX ensures that the structure of federated computations, including placement decisions and cross-machine communication patterns, is maintained. This capability allows for seamless interpretation of FAX computations by production systems, bridging the gap between research prototypes and deployable federated learning applications.

Future Directions

While FAX makes significant strides in integrating federated learning within JAX, potential areas for further development include extending federated AD to support non-linear communication primitives and expanding data placement strategies to encompass more complex hierarchical scenarios. Moreover, developing mature interpreters for translating FAX computations into formats compatible with specific production platforms represents an avenue for enhancing FAX’s applicability and utility in real-world federated learning scenarios.

Conclusion

FAX stands as a significant contribution to the field of distributed and federated machine learning, offering a scalable, efficient, and programmable framework for federated computations. By harnessing the power of JAX and adhering to a principled approach to data placement and automatic differentiation, FAX opens new possibilities for FL algorithm development and deployment. Its potential for accelerating research and facilitating the bridge to production systems underscores FAX's role as a pivotal tool in the advancement of federated learning technologies.

PDF Markdown

Related Papers

Tweets

https://twitter.com/_akhaliq/status/1767748700510577105

https://twitter.com/HPCPapers/status/1767793491428425962

https://twitter.com/knishimae0531/status/1768077068364083259

https://twitter.com/eprintbro/status/1767746219210657850