Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy
The research paper titled "Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy" by En Li, Zhi Zhou, and Xu Chen proposes an innovative approach to optimizing deep neural network (DNN) inference through the strategic utilization of both edge computing and mobile devices. This approach addresses significant performance and energy bottlenecks traditionally associated with DNN execution on resource-constrained mobile devices. The proposed framework, termed \textsf{Edgent}, introduces two primary mechanisms—DNN partitioning and DNN right-sizing—to achieve low-latency and efficient edge intelligence.
Key Contributions
- DNN Partitioning: The concept of DNN partitioning is leveraged to distribute the computational load between mobile devices and edge servers dynamically. By optimizing the partition point considering current network bandwidth conditions, the framework utilizes available proximate computing resources efficiently, thereby minimizing the latency for real-time DNN inference tasks.
- DNN Right-Sizing: This method accelerates DNN inference by enabling an early exit at intermediate DNN layers. It effectively strikes a balance between computational efficiency and inference accuracy, addressing the latency-accuracy tradeoff inherent in early exits. This is particularly beneficial for mission-critical applications with predefined deadlines where moderate accuracy tradeoff is acceptable to meet stringent latency requirements.
Framework Architecture
The proposed \textsf{Edgent} framework consists of three stages: offline training, online optimization, and co-inference. During the offline stage, performance models are developed through profiling, and DNN models are pre-trained with multiple exit points. The online optimization stage involves selecting the optimal exit and partition points based on the current network conditions, predicted layer latencies, and the specified latency requirement. Finally, at the co-inference stage, the DNN computation is executed collaboratively between the edge server and the mobile device according to the determined plan.
Evaluation and Results
The evaluation of \textsf{Edgent} was conducted using a prototype implemented on a Raspberry Pi, simulating a mobile device environment. This prototype was rigorously tested across various bandwidth conditions and latency requirements, demonstrating the effectiveness of the proposed partitioning and right-sizing strategies. Notably, the results indicated significant latency reductions when compared to device-only or edge-only execution strategies, while maintaining a satisfactory level of inference accuracy.
Implications and Future Directions
The proposed \textsf{Edgent} framework constitutes a significant step towards realizing efficient and real-time edge intelligence, particularly relevant as IoT devices and mobile applications increasingly adopt DNN-based models. The framework addresses key challenges associated with bandwidth variability and computational resource limitations, laying a robust foundation for practical deployment in various real-time applications such as augmented reality (AR) and autonomous robotics.
Looking forward, the paper opens several avenues for future research within the domain of AI and edge computing. Potential areas include further refinement of the proposed partitioning and right-sizing strategies to accommodate varying degrees of computational heterogeneity and extending the framework to encompass energy efficiency considerations. Given the fast-paced advancements in edge computing technologies and the continuous evolution of DNN architectures, \textsf{Edgent} provides a compelling blueprint for the development of next-generation intelligent edge solutions.