Oases: Efficient Large-Scale Model Training on Commodity Servers via Overlapped and Automated Tensor Model Parallelism (2305.16121v2)
Abstract: Deep learning is experiencing a rise in large-scale models. Training large-scale models is costly, prompting researchers to train large-scale models on commodity servers that more researchers can access. The massive number of parameters necessitates the use of model parallelism training methods. Existing studies focus on training with pipeline model parallelism. However, the tensor model parallelism (TMP) is inevitable when the model size keeps increasing, where frequent data-dependent communication and computation operations significantly reduce the training efficiency. In this paper, we present Oases, an automated TMP method with overlapped communication to accelerate large-scale model training on commodity servers. Oases proposes a fine-grained training operation schedule to maximize overlapping communication and computation that have data dependence. Additionally, we design the Oases planner that searches for the best model parameter partition strategy of TMP to achieve further accelerations. Unlike existing methods, Oases planner is tailored to model the cost of overlapped communication-computation operations. We evaluate Oases on various model settings and two commodity clusters, and compare Oases to four state-of-the-art implementations. Experimental results show that Oases achieves speedups of 1.01--1.48(\times) over the fastest baseline, and speedups of up to 1.95(\times) over Megatron.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.