Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

End-to-End Delay Minimization based on Joint Optimization of DNN Partitioning and Resource Allocation for Cooperative Edge Inference (2310.12937v1)

Published 19 Oct 2023 in cs.DC

Abstract: Cooperative inference in Mobile Edge Computing (MEC), achieved by deploying partitioned Deep Neural Network (DNN) models between resource-constrained user equipments (UEs) and edge servers (ESs), has emerged as a promising paradigm. Firstly, we consider scenarios of continuous AI task arrivals, like the object detection for video streams, and utilize a serial queuing model for the accurate evaluation of End-to-End (E2E) delay in cooperative edge inference. Secondly, to enhance the long-term performance of inference systems, we formulate a multi-slot stochastic E2E delay optimization problem that jointly considers model partitioning and multi-dimensional resource allocation. Finally, to solve this problem, we introduce a Lyapunov-guided Multi-Dimensional Optimization algorithm (LyMDO) that decouples the original problem into per-slot deterministic problems, where Deep Reinforcement Learning (DRL) and convex optimization are used for joint optimization of partitioning decisions and complementary resource allocation. Simulation results show that our approach effectively improves E2E delay while balancing long-term resource constraints.

Summary

We haven't generated a summary for this paper yet.