Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CoEdge: Cooperative DNN Inference with Adaptive Workload Partitioning over Heterogeneous Edge Devices (2012.03257v1)

Published 6 Dec 2020 in cs.NI, cs.CV, and cs.DC

Abstract: Recent advances in artificial intelligence have driven increasing intelligent applications at the network edge, such as smart home, smart factory, and smart city. To deploy computationally intensive Deep Neural Networks (DNNs) on resource-constrained edge devices, traditional approaches have relied on either offloading workload to the remote cloud or optimizing computation at the end device locally. However, the cloud-assisted approaches suffer from the unreliable and delay-significant wide-area network, and the local computing approaches are limited by the constrained computing capability. Towards high-performance edge intelligence, the cooperative execution mechanism offers a new paradigm, which has attracted growing research interest recently. In this paper, we propose CoEdge, a distributed DNN computing system that orchestrates cooperative DNN inference over heterogeneous edge devices. CoEdge utilizes available computation and communication resources at the edge and dynamically partitions the DNN inference workload adaptive to devices' computing capabilities and network conditions. Experimental evaluations based on a realistic prototype show that CoEdge outperforms status-quo approaches in saving energy with close inference latency, achieving up to 25.5%~66.9% energy reduction for four widely-adopted CNN models.

Cooperative DNN Inference on Heterogeneous Edge Devices: A Review of CoEdge

The paper proposes a novel framework, CoEdge, which addresses the deployment of Deep Neural Network (DNN) inference tasks across heterogeneous edge devices. As devices at the edge of the network continue to proliferate, use cases such as smart homes, factories, and cities demand efficient solutions for performing computation-intensive DNN tasks. This research examines the potential for an edge-centric approach, splitting the workload dynamically among devices to harness local computing resources efficiently.

CoEdge is structured to function in two phases. In the setup phase, devices gather profiling data, consisting of the computation intensity, computation frequency, available memory, and power consumption for both computing and communication tasks. This data is pivotal for enabling CoEdge to tailor resource allocation during the runtime phase when the actual DNN task is executed. During runtime, CoEdge dynamically partitions the input data based on current network conditions and each device's computing capability.

A significant contribution of this work is the adaptive workload partitioning scheme, which minimizes energy consumption while adhering to user-specified latency constraints. The authors elegantly demonstrate the NP-hard nature of optimizing partition sizes relative to available resources. They employ a constrained programming model and show, through a linear program relaxation, that effective solutions can be generated in real-time. The implemented partitioning algorithm thus addresses the intricacies of balancing computation and communication tradeoffs—a recurrent theme in the orchestration of distributed systems.

In experimental evaluations, the paper showcases the efficacy of CoEdge across a diverse array of DNN models, including AlexNet, VGG-f, GoogLeNet, and MobileNet. Their prototype, featuring four Raspberry Pis, a Jetson TX2, and a desktop PC, achieves up to 66.9% energy savings compared to existing methods, MoDNN and Musical Chair, while maintaining competitive inference latency. The authors provide detailed insights into system behavior under various conditions, including deadline looseness, scalability, and network fluctuations. Notably, CoEdge demonstrates robustness to network instability, maintaining latency performance while optimizing energy consumption.

This research has both theoretical and practical implications. Theoretically, it advances the understanding of distributed load balancing under resource constraints, quintessential for modern edge computing paradigms. Practically, it points towards the feasibility of mixed-environment DNN deployment, which optimizes for latency and energy—a dual objective that has tangible benefits in real-world IoT deployments. Future work could well expand CoEdge to adapt to more dynamic contexts, incorporating predictive mechanisms for device reliability and workload fluctuations.

In conclusion, the CoEdge framework enriches the domain of distributed edge intelligence, capitalizing on collaborative computing and adaptive partitioning methodologies. As edge devices become more diverse and integral to AI applications, such frameworks will likely form the bedrock of efficient, cooperative DNN deployments at the edge.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Liekang Zeng (21 papers)
  2. Xu Chen (413 papers)
  3. Zhi Zhou (135 papers)
  4. Lei Yang (372 papers)
  5. Junshan Zhang (75 papers)
Citations (174)