Auto-Split: A General Framework of Collaborative Edge-Cloud AI (2108.13041v1)

Published 30 Aug 2021 in cs.LG and cs.AI

Abstract: In many industry scale applications, large and resource consuming machine learning models reside in powerful cloud servers. At the same time, large amounts of input data are collected at the edge of cloud. The inference results are also communicated to users or passed to downstream tasks at the edge. The edge often consists of a large number of low-power devices. It is a big challenge to design industry products to support sophisticated deep model deployment and conduct model inference in an efficient manner so that the model accuracy remains high and the end-to-end latency is kept low. This paper describes the techniques and engineering practice behind Auto-Split, an edge-cloud collaborative prototype of Huawei Cloud. This patented technology is already validated on selected applications, is on its way for broader systematic edge-cloud application integration, and is being made available for public use as an automated pipeline service for end-to-end cloud-edge collaborative intelligence deployment. To the best of our knowledge, there is no existing industry product that provides the capability of Deep Neural Network (DNN) splitting.

PDF Abstract

A Technical Exploration of Auto-Split: A Framework for Collaborative Edge-Cloud AI

The paper under analysis presents Auto-Split, a framework designed for the efficient deployment of Deep Neural Networks (DNNs) in industrial applications where resources are distributed between edge devices and cloud servers. This concept addresses the challenges posed by the massive growth in the size of AI models and the voluminous data collection at the network edge.

Key Contributions

Auto-Split proposes a novel approach by integrating DNN splitting with post-training quantization. The primary objective is to reduce end-to-end latency without significant accuracy loss. By doing so, it surpasses existing techniques such as uniform quantization and conventional edge-cloud inference methods. Specifically, Auto-Split partitions a model into an "edge DNN" that operates on low-power edge devices and a "cloud DNN" processed on powerful cloud servers. The method seeks an optimal balance between model accuracy, device constraints, transmission costs, and latency.

Numerical Results and Claims

The authors substantiate their claims through extensive experimentation with various DNN architectures such as ResNet-50, MobileNet, and detection models like YOLOv3. The results indicate that Auto-Split can achieve a latency reduction of 20-80% compared to state-of-the-art techniques like QDMP, while maintaining, or even improving, the accuracy of the models involved. Furthermore, it demonstrates the capability to deliver more than 40% reduction in model size requirements at the edge, highlighting its practical utility in real-world applications.

Practical and Theoretical Implications

From a practical standpoint, Auto-Split is poised to drive the development of AI applications that are responsive and capable of on-device intelligence, essential for applications with stringent latency and privacy requirements, such as autonomous vehicles, smart cities, and IoT devices. Theoretically, this framework could inspire further research in optimizing model partitioning strategies and the innovative use of mixed-precision quantization in collaborative computing environments.

Speculations on AI Developments

The introduction of Auto-Split could potentiate future developments in AI by catalyzing improvements in distributed system designs and edge intelligence ecosystems. The principles established in this paper might pave the way for advanced hybrid models that exploit high-level abstractions and domain-specific knowledge to further refine edge-cloud collaborative intelligence.

Conclusion

In summary, the Auto-Split paper contributes substantially to the domain of distributed AI by offering a versatile and efficient framework. It poses significant implications for threading deep learning capabilities into the edge-cloud architecture, promising to enhance both computational efficiency and application versatility. Future research could expand upon these findings to encompass broader applications and system designs, further closing the gap between model complexity and the computational capabilities at the edge.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Amin Banitalebi-Dehkordi (41 papers)
Naveen Vedula (1 paper)
Jian Pei (104 papers)
Fei Xia (111 papers)
Lanjun Wang (36 papers)
Yong Zhang (660 papers)

Citations (75)

View on Semantic Scholar

Related Papers

Find Related Papers

YouTube

Show All Videos