Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Split Computing and Early Exiting for Deep Learning Applications: Survey and Research Challenges (2103.04505v4)

Published 8 Mar 2021 in eess.SP and cs.LG

Abstract: Mobile devices such as smartphones and autonomous vehicles increasingly rely on deep neural networks (DNNs) to execute complex inference tasks such as image classification and speech recognition, among others. However, continuously executing the entire DNN on mobile devices can quickly deplete their battery. Although task offloading to cloud/edge servers may decrease the mobile device's computational burden, erratic patterns in channel quality, network, and edge server load can lead to a significant delay in task execution. Recently, approaches based on split computing (SC) have been proposed, where the DNN is split into a head and a tail model, executed respectively on the mobile device and on the edge server. Ultimately, this may reduce bandwidth usage as well as energy consumption. Another approach, called early exiting (EE), trains models to embed multiple "exits" earlier in the architecture, each providing increasingly higher target accuracy. Therefore, the trade-off between accuracy and delay can be tuned according to the current conditions or application demands. In this paper, we provide a comprehensive survey of the state of the art in SC and EE strategies by presenting a comparison of the most relevant approaches. We conclude the paper by providing a set of compelling research challenges.

An Overview of Split Computing and Early Exiting for Deep Learning Applications

The proliferation of deep neural networks (DNNs) in mobile and edge computing applications, spanning computer vision (CV) and NLP, underscores the need for efficient computation strategies. Split computing (SC) and early exiting (EE) are particularly promising approaches in this regard, optimizing the resource-constrained execution of DNNs without compromising on accuracy.

Split Computing (SC)

Split computing strategically partitions a DNN model, with the head executed on the mobile device and the tail on an edge server. This approach aims to balance the computational load while mitigating latency caused by data transfer. Traditional SC methods, such as those surveyed by Matsubara et al., initially explored the potential of SC without modifying DNN architectures. However, these approaches often encountered bottlenecks because modern DNNs lack natural compression stages suitable for mobile environments. Consequently, recent advancements have focused on injecting artificial bottlenecks to streamline communication and computation. These bottleneck architectures have shown significant promise in reducing data transfer sizes by up to 94%, with minimal impacts on accuracy.

The training of bottleneck-modified models presented in the paper leverages techniques such as Head Network Distillation (HND) and knowledge distillation. These methodologies guide the student models—variants with bottlenecks—using pre-trained teacher models to maintain task performance while ensuring that the compressed intermediate representations remain informative for downstream computations.

Early Exiting (EE)

Early exiting, first exemplified in BranchyNet, leverages the inherent overparameterization of DNNs by introducing intermediate classifiers at various network depths. The inference process can terminate at different exits depending on the confidence level of each classifier, thus offering a dynamic trade-off between accuracy and efficiency based on real-time requirements. The joint training of early exit classifiers simultaneously with the main model or using separate training paradigms are two prevalent strategies highlighted in the paper.

This paper extensively evaluates both approaches across several computing platforms, demonstrating how these methods can effectively alleviate some of the computational burdens from mobile devices. Notably, SC is predominantly applied to CV tasks due to the large data sizes involved, while EE has found utility in both CV and NLP, particularly with transformer models, which are computationally intensive.

Implications and Future Directions

The survey provides a comprehensive synthesis of SC and EE methodologies, underscoring their potential to significantly improve the operational efficiency of DNN applications in edge computing environments. The implications are vast, offering paths to more responsive, resource-efficient AI applications in domains such as autonomous navigation, real-time speech recognition, and beyond.

However, challenges remain, notably in optimizing bottleneck placements and understanding the theoretical underpinnings in an information-theoretic framework. Future research could pivot towards these theoretical insights, developing more data-driven approaches to determine optimal points for computation offloading and exit strategies based on real-time metrics and application demands.

In summary, the evolution of SC and EE represents a critical step towards sustainable AI in decentralized computing environments. The strategies outlined and evaluated in this paper provide a foundational framework for developing next-generation AI systems that are not only performant but also efficiently aligned with the constraints and capabilities of edge computing infrastructures.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yoshitomo Matsubara (14 papers)
  2. Marco Levorato (50 papers)
  3. Francesco Restuccia (64 papers)
Citations (175)