Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing (1910.05316v1)

Published 4 Oct 2019 in cs.NI, cs.CV, cs.DC, and cs.LG

Abstract: As a key technology of enabling AI applications in 5G era, Deep Neural Networks (DNNs) have quickly attracted widespread attention. However, it is challenging to run computation-intensive DNN-based tasks on mobile devices due to the limited computation resources. What's worse, traditional cloud-assisted DNN inference is heavily hindered by the significant wide-area network latency, leading to poor real-time performance as well as low quality of user experience. To address these challenges, in this paper, we propose Edgent, a framework that leverages edge computing for DNN collaborative inference through device-edge synergy. Edgent exploits two design knobs: (1) DNN partitioning that adaptively partitions computation between device and edge for purpose of coordinating the powerful cloud resource and the proximal edge resource for real-time DNN inference; (2) DNN right-sizing that further reduces computing latency via early exiting inference at an appropriate intermediate DNN layer. In addition, considering the potential network fluctuation in real-world deployment, Edgentis properly design to specialize for both static and dynamic network environment. Specifically, in a static environment where the bandwidth changes slowly, Edgent derives the best configurations with the assist of regression-based prediction models, while in a dynamic environment where the bandwidth varies dramatically, Edgent generates the best execution plan through the online change point detection algorithm that maps the current bandwidth state to the optimal configuration. We implement Edgent prototype based on the Raspberry Pi and the desktop PC and the extensive experimental evaluations demonstrate Edgent's effectiveness in enabling on-demand low-latency edge intelligence.

Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing

The paper entitled "Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing" presents a detailed exploration of challenges and solutions associated with running computation-intensive deep neural networks (DNNs) on mobile devices. The authors introduce \textsf{Edgent}, a framework leveraging edge computing to enhance DNN inference, addressing latency and resource limitations of mobile devices.

Key Contributions

The paper identifies two primary mechanisms to facilitate efficient DNN execution: DNN Partitioning and DNN Right-Sizing. DNN Partitioning involves distributing the computational workload between the mobile device and edge servers, optimizing resource utilization. DNN Right-Sizing employs an early-exit strategy, allowing the inference process to terminate at an intermediate layer, reducing processing time with an acceptable trade-off in accuracy.

Methodology and Design

The authors delineate the operation of \textsf{Edgent} across three stages:

  1. Offline Configuration: Regression models are trained to predict layer-wise inference latency on both mobile devices and edge servers. Branchy DNN models equipped with multiple exit points are also trained to accommodate the early-exit mechanism.
  2. Online Tuning: An optimal co-inference plan is generated via runtime optimization, considering real-time bandwidth conditions and predefined latency constraints. This plan specifies the partition and exit points within the DNN to minimize inference latency while maximizing accuracy.
  3. Dynamic Environment Adaptation: For fluctuating network conditions, a configuration map is created using historical bandwidth trace data to record optimal decisions for different bandwidth states. This prevents latency violations and ensures responsive DNN inference.

Evaluation and Results

The framework's effectiveness was evaluated using a prototype implementation on Raspberry Pi and a desktop PC. \textsf{Edgent} consistently demonstrated improved latency performance in both stable and dynamic network conditions compared to traditional methods, achieving a balance between accuracy and latency.

  • Static Environment: \textsf{Edgent} maintains a high level of accuracy even under bandwidth restrictions, adapting the exit and partition points to changing network conditions.
  • Dynamic Environment: The dynamic configurator outperformed the static version under varying bandwidth, demonstrating its effectiveness in adapting to real-time changes and maintaining optimal performance.

Implications and Future Work

This research paves the way for more responsive DNN inference in edge computing environments, crucial for applications requiring real-time processing, such as intelligent security and industrial automation. Future directions could explore integrating additional model compression techniques or extending \textsf{Edgent} to support multi-device environments, thereby broadening its applicability.

Conclusion

The paper successfully presents \textsf{Edgent} as a viable solution for enhancing real-time DNN inference via edge computing. By intelligently partitioning computational tasks and utilizing early-exit strategies, \textsf{Edgent} improves inference latency and efficiency, even in bandwidth-constrained settings. As edge computing becomes more prevalent, solutions like \textsf{Edgent} will increasingly be pivotal for deploying AI in mobile and edge environments.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. En Li (12 papers)
  2. Liekang Zeng (21 papers)
  3. Zhi Zhou (135 papers)
  4. Xu Chen (413 papers)
Citations (534)