AReaL-Hex: Accommodating Asynchronous RL Training over Heterogeneous GPUs (2511.00796v1)

Published 2 Nov 2025 in cs.DC and cs.LG

Abstract: Maximizing training throughput and cost-efficiency of RL for LLMs is essential to democratize this advanced technique. One promising but challenging approach is to deploy such a computational workflow over heterogeneous GPUs. Unlike conventional large-scale LLM pretraining, RL training generally decomposes into three coupled stages, i.e., rollout generation, reward computation, and policy/value updates, which exhibit markedly different compute intensities, memory footprints, and communication patterns. Recent research shows that fully asynchronous RL training can disaggregate these stages across disjoint hardware pools without sacrificing training stability, creating a great opportunity for real-world heterogeneous deployment. To this end, we present AReaL-Hex, a heterogeneity-aware asynchronous RL training system that effectively schedules how to execute rollout generation and policy model training over heterogeneous GPUs while enforcing data staleness bounds. Concretely, we use a two-phase scheduler: (i) a constrained search with MILP to select per-stage parallelization strategies and workload assignments given a resource budget, and (ii) a graph-partitioning step that allocates heterogeneous GPUs and interconnects to maximize end-to-end throughput. Built atop a fully asynchronous RL architecture, AReaL-Hex maps HBM-I/O-bound generation and compute-bound optimization to more cost-efficient resources and balances their producer-consumer interactions to avoid both idleness and stale rollout trajectories. On the mathematical reasoning task with various model scales (1.5B, 7B, and 14B), compared to homogeneous deployments of state-of-the-art asynchronous RL systems: (i) When maintaining the same total budgets, AReaL-Hex delivers up to 1.50x higher training throughput; (ii) When achieving the same training throughput, AReaL-Hex results in up to 1.46x reduction in training cost.

Summary

The paper presents AReaL-Hex, a system that improves asynchronous RL training by dynamically aligning GPU resources with computational demands.
It introduces a two-phase scheduling algorithm utilizing MILP and graph partitioning to optimally manage rollout generation and training tasks.
Performance evaluations demonstrate up to 2.76× throughput improvement and significant cost reductions compared to homogeneous GPU clusters.

AReaL-Hex: Accommodating Asynchronous RL Training over Heterogeneous GPUs

Introduction

The paper presents AReaL-Hex, a comprehensive system designed to facilitate asynchronous RL training in heterogeneous GPU environments. Traditional RL training for LLMs is challenging due to the heterogeneous nature of computational demands across different stages: rollout generation, reward computation, and model updates. AReaL-Hex addresses these challenges by employing a heterogeneity-aware scheduling algorithm that efficiently allocates resources and optimizes execution throughput while maintaining constraints on data freshness.

Figure 1: Rollout inference (short as Inf) and model training (short as Train) execution latency comparison between homogeneous setting 1, 2 and heterogeneous setting across different model scales.

System Design and Scheduling Algorithm

The system utilizes a two-phase scheduling algorithm. The first phase involves resource allocation and selection of parallel strategies using constrained search and mixed-integer linear programming (MILP). The second phase employs graph partitioning to allocate GPU resources, optimizing the throughput and balancing asynchronous interactions.

Resource Allocation

The scheduling algorithm divides the GPU resources into two distinct sets: one for rollout generation and another for model training. This separation allows the system to allocate GPUs with high computing power to computationally intensive model training tasks and GPUs with high-memory bandwidth to memory-bound rollout generation tasks.

Parallel Strategy Search

A key feature of AReaL-Hex is the parallel strategy search, which identifies optimal configurations for data, pipeline, and tensor model parallelism. The algorithm imposes a constraint that parallel tasks must utilize GPUs of the same type to minimize communication overhead, leading to efficient execution of neural tasks.

Performance Evaluation and Results

In experimental settings comparing AReaL-Hex against homogeneous configurations, the heterogeneous setup consistently achieved higher throughput, delivering up to 2.76 times the speed of the homogeneous H20 GPU clusters.

Breakdown Analysis

The performance breakdown shows that utilizing heterogeneous GPU clusters effectively reduces rollout generation, training execution latency, and cost. AReaL-Hex on heterogeneous clusters can execute asynchronous RL operations more efficiently compared to homogeneous configurations even when accommodating the same training throughput.

Figure 2: We present a breakdown of experiments comparing AReaL-Hex running on a 56-GPU heterogeneous cluster against AReaL running on a 24-GPU H800 homogeneous cluster.

Cost Efficiency

Cost analysis reveals AReaL-Hex’s ability to operate substantially cheaper than homogeneous setups when maintaining similar performance levels. This demonstrates the economic advantage of deploying heterogeneous systems, significantly reducing operational costs per training task.

Figure 3: Case paper of AReaL-Hex's performance cost-efficiency across different cluster sizes ranging from 24 to 56 GPUs. For the H20 and H800 GPU per-hour costs, we follow the prior practice.

Conclusion

AReaL-Hex successfully leverages heterogeneous GPU environments to enhance RL training efficiency and reduce costs. Its scheduling algorithm aligns GPU resources to varying computational demands effectively, promoting higher throughput and economic execution of large-scale LLM tasks. The system’s innovative scheduling mechanisms and strategic deployment of heterogeneous resources offer significant improvements over traditional homogeneous setups, demonstrating valuable implications in cost-effective AI training and deployment strategies. This opens new directions for deploying RL training at scale, making it more accessible and efficient.