Beyond Data and Model Parallelism for Deep Neural Networks (1807.05358v1)

Published 14 Jul 2018 in cs.DC

Abstract: The computational requirements for training deep neural networks (DNNs) have grown to the point that it is now standard practice to parallelize training. Existing deep learning systems commonly use data or model parallelism, but unfortunately, these strategies often result in suboptimal parallelization performance. In this paper, we define a more comprehensive search space of parallelization strategies for DNNs called SOAP, which includes strategies to parallelize a DNN in the Sample, Operation, Attribute, and Parameter dimensions. We also propose FlexFlow, a deep learning framework that uses guided randomized search of the SOAP space to find a fast parallelization strategy for a specific parallel machine. To accelerate this search, FlexFlow introduces a novel execution simulator that can accurately predict a parallelization strategy's performance and is three orders of magnitude faster than prior approaches that have to execute each strategy. We evaluate FlexFlow with six real-world DNN benchmarks on two GPU clusters and show that FlexFlow can increase training throughput by up to 3.8x over state-of-the-art approaches, even when including its search time, and also improves scalability.

Authors (3)

Zhihao Jia (43 papers)
Matei Zaharia (101 papers)
Alex Aiken (33 papers)

Citations (465)

View on Semantic Scholar

Summary

Beyond Data and Model Parallelism for Deep Neural Networks

The paper "Beyond Data and Model Parallelism for Deep Neural Networks" by Zhihao Jia et al. offers a novel perspective on optimizing the parallelization of deep neural network (DNN) training. Traditional approaches, primarily data and model parallelism, often fall short in effectively managing the substantial computational demands required for modern DNNs. This work introduces a comprehensive search space for parallelization strategies termed SOAP, which encompasses Sample, Operation, Attribute, and Parameter dimensions. It further presents FlexFlow, a framework utilizing this search space to significantly enhance training throughput.

Key Contributions

The authors delineate the limitations of prevalent parallelization methods, notably their inadequacies in handling large parameters effectively in compute-intensive operations like matrix multiplication. Against this backdrop, FlexFlow emerges as a solution capable of dynamically exploring parallelization strategies beyond traditional confines, optimizing across the defined SOAP dimensions. The introduction of an innovative execution simulator within FlexFlow is pivotal, facilitating rapid and accurate performance predictions of parallelization strategies. The simulator's speed—reportedly three orders of magnitude faster than previously relied-upon methods—allows for efficient exploration of the vastly expanded strategy space.

Experimental Validation

Rigorous evaluation through six DNN benchmarks on heterogeneous GPU clusters underscores the efficacy of FlexFlow. The results indicate training throughput enhancements of up to 3.8 times compared to established methods, including data parallelism and expert-engineered strategies. Furthermore, the strategies generated by FlexFlow exhibited up to 2.3 times faster execution than handcrafted expert designs and offered clear superiority in scalability.

Analytical Observations

The execution simulator's accuracy was validated, with deviations between real and simulated execution times consistently below 30%. Importantly, the relative ordering of execution times was preserved, attesting to the simulator’s reliability as an evaluation metric. Notably, the execution optimizer employs a Markov Chain Monte Carlo (MCMC) method, leveraging simulated performance insights to navigate the expansive SOAP search space efficiently.

Implications and Future Directions

The implications of this work span both theoretical and practical dimensions. Theoretically, it challenges the status quo of parallelization strategies in DNNs and provides a robust framework for more inclusive parallelization approaches. Practically, FlexFlow's adaptability ensures better utilization of computational resources, thereby potentially reducing training costs and time in real-world applications.

Future research could delve into further refining the simulator's accuracy and extending FlexFlow's capabilities to embrace emerging hardware architectures. Additionally, exploring the interplay between FlexFlow's strategies and other optimization techniques, such as learning rate schedules and data augmentations, could yield richer insights.

In summary, this paper presents a thorough and innovative approach to DNN parallelization, combining theoretical insights with practical implementations to push beyond conventional data and model parallelism boundaries.

PDF Markdown

Related Papers

YouTube

Show All Videos