Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
91 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
o3 Pro
5 tokens/sec
GPT-4.1 Pro
15 tokens/sec
DeepSeek R1 via Azure Pro
33 tokens/sec
Gemini 2.5 Flash Deprecated
12 tokens/sec
2000 character limit reached

Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering (2402.18927v1)

Published 29 Feb 2024 in cs.CV, cs.MM, and cs.NI

Abstract: This paper proposes a novel edge computing enabled real-time video analysis system for intelligent visual devices. The proposed system consists of a tracking-assisted object detection module (TAODM) and a region of interesting module (ROIM). TAODM adaptively determines the offloading decision to process each video frame locally with a tracking algorithm or to offload it to the edge server inferred by an object detection model. ROIM determines each offloading frame's resolution and detection model configuration to ensure that the analysis results can return in time. TAODM and ROIM interact jointly to filter the repetitive spatial-temporal semantic information to maximize the processing rate while ensuring high video analysis accuracy. Unlike most existing works, this paper investigates the real-time video analysis systems where the intelligent visual device connects to the edge server through a wireless network with fluctuating network conditions. We decompose the real-time video analysis problem into the offloading decision and configurations selection sub-problems. To solve these two sub-problems, we introduce a double deep Q network (DDQN) based offloading approach and a contextual multi-armed bandit (CMAB) based adaptive configurations selection approach, respectively. A DDQN-CMAB reinforcement learning (DCRL) training framework is further developed to integrate these two approaches to improve the overall video analyzing performance. Extensive simulations are conducted to evaluate the performance of the proposed solution, and demonstrate its superiority over counterparts.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (10)
  1. J. Chen, C. Yi, et al., “Networking architecture and key supporting technologies for human digital twin in personalized healthcare: A comprehensive survey,” IEEE Commun. Surv. Tutor., pp. 1–1, 2023.
  2. H. Liu and G. Cao, “Deep learning video analytics through online learning based edge computing,” IEEE Trans. Wirel. Commun., vol. 21, no. 10, pp. 8193–8204, 2022.
  3. Y. Shi, C. Yi, R. Wang et al., “Service migration or task rerouting: A two-timescale online resource optimization for mec,” IEEE Trans. Wirel. Commun., pp. 1–1, 2023.
  4. L. Dong, Z. Yang et al., “WAVE: Edge-device cooperated real-time object detection for open-air applications,” IEEE Trans. Mob. Comput., pp. 1–1, 2022.
  5. Y. Xu, X. Liu et al., “Cross-view people tracking by scene-centered spatio-temporal parsing,” in Proc. AAAI, vol. 31, no. 1, 2017.
  6. J. F. Henriques, R. Caseiro et al., “High-speed tracking with kernelized correlation filters,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 3, pp. 583–596, 2015.
  7. A. Lukežic, T. Vojír et al., “Discriminative correlation filter with channel and spatial reliability,” in Proc. IEEE CVPR, 2017, pp. 4847–4856.
  8. K. Zhao, Z. Zhou et al., “Edgeadaptor: Online configuration adaption, model selection and resource provisioning for edge DNN inference serving at scale,” IEEE Trans. Mob. Comput., pp. 1–16, 2022.
  9. J. Chen, C. Yi et al., “Learning aided joint sensor activation and mobile charging vehicle scheduling for energy-efficient wrsn-based industrial iot,” IEEE Trans. Veh. Technol., vol. 72, no. 4, pp. 5064–5078, 2023.
  10. R. Chen, C. Yi, K. Zhu et al., “A three-party hierarchical game for physical layer security aware wireless communications with dynamic trilateral coalitions,” IEEE Trans. Wirel. Commun., pp. 1–1, 2023.
Citations (1)

Summary

  • The paper introduces a novel DDQN-CMAB framework that dynamically optimizes offloading decisions and model configurations for real-time video analysis.
  • It uses reinforcement learning to balance processing speed and detection accuracy under varying network conditions.
  • Simulations on a multi-camera pedestrian dataset show superior performance, highlighting its practical impact on edge computing applications.

Adaptive Edge Computing for Real-Time Video Analysis through DDQN-CMAB Reinforcement Learning

Decomposition of Real-Time Video Analysis Problem

In tackling the challenges of real-time video analysis in edge computing environments, this paper introduces a novel approach that effectively manages the trade-off between frame processing rate and accuracy. The core of the discussion centers on the decomposition of the video analysis problem into two pivotal sub-problems:

  • The optimal offloading decision, which determines whether video frames should be processed locally or offloaded to an edge server.
  • The adaptive selection of detection model configurations and offloading resolutions to ensure timely and accurate analysis results.

Approach and Contribution

Double Deep Q Network (DDQN) Based Offloading: In addressing the first sub-problem, the paper leverages a Double Deep Q Network (DDQN) to dynamically make offloading decisions. This method successfully mitigates the issue of over-estimation present in traditional DQN models by decoupling the action selection from the evaluation of Q-target using a target network. This innovation enhances the decision-making process regarding whether to process video frames locally on the device or offload them for edge processing.

Contextual Multi-Armed Bandit (CMAB) Based Configurations Selection: For the second sub-problem, the paper implements a Contextual Multi-Armed Bandit (CMAB) approach. This method dynamically adjusts the configurations of the detection models and offloading resolutions based on varying network conditions and video content complexities. Through this approach, the system can adaptively select the most suitable configurations to match real-time network status and video content, optimizing both the processing rate and detection accuracy.

This paper's significant contributions can be summarized as follows:

  • It proposes an adaptive spatial-temporal semantic filtering-based video analysis system that utilizes edge computing to achieve real-time video analysis with high accuracy under uncertain and fluctuating network conditions.
  • It designs a DDQN-CMAB Reinforcement Learning (DCRL) framework to solve the problem of making optimal offloading decisions and adaptively selecting configurations under unpredictable network conditions effectively.
  • Through extensive simulations using the Multi-camera Pedestrian Video Dataset, the proposed DCRL framework demonstrates superior performance over existing benchmarks in terms of maintaining a high frame processing rate while ensuring detection accuracy.

Theoretical and Practical Implications

The research presents a unique intersection of DDQN and CMAB methodologies to address real-time video analysis challenges in edge computing. Theoretical insights into the problem decomposition strategy and the application of reinforcement learning techniques provide a strong foundation for future studies in intelligent visual devices and edge computing optimization.

Practically, the proposed DCRL framework offers a scalable solution for applications across various domains requiring real-time video analysis, such as autonomous driving, surveillance, and urban management. By efficiently managing the computational constraints of intelligent devices and the fluctuating nature of network conditions, this approach promotes the broader adoption and optimization of edge computing applications.

Future Directions in AI and Edge Computing

Looking ahead, this research opens new avenues in the integration of reinforcement learning with edge computing. Future work could explore the extension of the DCRL framework to manage multi-agent systems in edge environments, further enhancing the scalability and efficiency of real-time video analysis. Additionally, incorporating advancements in deep learning and neural network optimization could yield further improvements in both offloading decision-making and configuration selection processes.

In summary, this paper marks a significant step towards realizing the full potential of edge computing in supporting real-time video analysis. By addressing the critical balance between processing rate and accuracy through an innovative reinforcement learning approach, it sets the stage for future advancements in intelligent visual devices and edge computing capabilities.