An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training (2312.11819v3)

Published 19 Dec 2023 in cs.LG, cs.AI, and cs.CL

Abstract: Recently, ChatGPT or InstructGPT like LLMs (LLM) has made a significant impact in the AI world. Many works have attempted to reproduce the complex InstructGPT's training pipeline, namely Reinforcement Learning with Human Feedback (RLHF). However, the mainstream distributed RLHF training methods typically adopt a fixed model placement strategy, referred to as the Co-located strategy. This strategy treats all four interdependent models involved in RLHF as a single entity, distributing them across all devices and applying parallelism techniques designed for a single model, regardless of the workload heterogeneity inherent to each model. As a result, this strategy exacerbates the generation bottlenecks in the RLHF training and degrades the overall training efficiency. To address these issues, we propose a flexible model placement framework that offers two general and agile model placement strategies. The Interleaving strategy helps reduce memory redundancy and communication costs of RLHF training by placing models without dependencies on exclusive devices with careful orchestration. On the other hand, the Disaggregated strategy improves the throughput of model training by separating the training and inference runtime of the RLHF pipeline with additional shadow models. Furthermore, our framework provides a simple user interface and guidelines to easily and flexibly configure these strategies in various training scenarios. Our experiments have shown that our strategy can achieve notable improvements up to 11x, compared to the current state-of-the-art (SOTA) approaches. The results highlight the effectiveness and adaptability of our methods in accelerating the training of distributed RLHF.

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

References (43)

Authors (9)

Youshao Xiao (6 papers)
Weichang Wu (6 papers)
Zhenglei Zhou (4 papers)
Fagui Mao (1 paper)
Shangchun Zhao (4 papers)
Lin Ju (10 papers)
Lei Liang (37 papers)
Xiaolu Zhang (39 papers)
Jun Zhou (370 papers)

Citations (3)

View on Semantic Scholar

An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training (2312.11819v3)

Related Papers