Video Coding for Machines: A Paradigm of Collaborative Compression and Intelligent Analytics (2001.03569v2)

Published 10 Jan 2020 in cs.CV

Abstract: Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale. That is, one is with compactness and efficiency to serve for machine vision, and the other is with full fidelity, bowing to human perception. The recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, i.e. Compact Descriptors for Visual Search and Compact Descriptors for Video Analysis, promote the sustainable and fast development in their own directions, respectively. In this paper, thanks to booming AI technology, e.g. prediction and generation models, we carry out exploration in the new area, Video Coding for Machines (VCM), arising from the emerging MPEG standardization efforts1. Towards collaborative compression and intelligent analytics, VCM attempts to bridge the gap between feature coding for machine vision and video coding for human vision. Aligning with the rising Analyze then Compress instance Digital Retina, the definition, formulation, and paradigm of VCM are given first. Meanwhile, we systematically review state-of-the-art techniques in video compression and feature compression from the unique perspective of MPEG standardization, which provides the academic and industrial evidence to realize the collaborative compression of video and feature streams in a broad range of AI applications. Finally, we come up with potential VCM solutions, and the preliminary results have demonstrated the performance and efficiency gains. Further direction is discussed as well.

PDF Abstract

Video Coding for Machines: Collaborative Compression and Intelligent Analytics

The paper "Video Coding for Machines: A Paradigm of Collaborative Compression and Intelligent Analytics" by Ling-Yu Duan et al. presents an in-depth exploration of a novel concept termed Video Coding for Machines (VCM). The authors aim to bridge the gap between video coding optimized for human perception and feature coding tailored to machine vision tasks. This endeavor is timely considering the proliferation of applications such as smart cities and IoT, where efficient video data management is crucial.

Key Concepts and Innovations

VCM is introduced as a framework that combines video and feature coding to cater adequately to both machine and human vision. Traditionally, video coding has focused on achieving high fidelity in reconstructing video frames for human interaction, as exemplified by standards like MPEG-4 AVC and HEVC. On the other hand, feature compression prioritizes transmitting key information for machine tasks such as recognition and retrieval, culminating in standards like MPEG-7's CDVS and CDVA.

The authors propose that integrating these two approaches could offer significant benefits. VCM leverages AI advancements in prediction and generative models to achieve this integration. Drawing from the principles of Analyze then Compress, VCM targets sustainable development across both domains.

Review of Existing Techniques

The paper systematically reviews existing video and feature compression techniques and how they apply to the VCM framework. For video compression, the transition from traditional block-based coding schemes to deep learning-based approaches is highlighted. Deep learning methods offer more flexible and nuanced models, improving prediction and compression efficiency through architectures like CNNs and RNNs. Meanwhile, feature compression efforts have been steered towards reducing bitrate while maintaining discriminative power, a key concern for machine vision applications.

Proposed VCM Solutions

The authors propose several potential solutions within the VCM paradigm:

Deep Intermediate Feature Compression: This involves compressing intermediate network features rather than top-layer features or raw video data, to minimize computing overhead while supporting diverse machine vision tasks.
Predictive Coding with Collaborative Feedback: In this solution, the video encoding process is assisted by sparse motion patterns extracted from key frames, enabling more efficient inter-frame predictions. It utilizes feedback mechanisms to iteratively refine predictions, thereby optimizing both coding efficiency and task performance.
Enhancing Predictive Coding with Scalable Feedback: Building on the previous mechanism, this approach introduces scalability, where additional resources are used to incrementally enhance the video and feature quality through feedback loops.

These solutions are especially relevant for applications that demand robust analytics and visualization capabilities, such as surveillance systems in smart cities.

Results and Implications

Preliminary experiments showcase the effectiveness of these approaches in reducing bitrate while maintaining high recognition accuracy and video reconstruction quality. This suggests that VCM has promising potential in economizing both computational and communicational resources while enhancing performance for various AI applications.

Future Directions

The paper identifies several future directions, such as establishing entropy bounds for tasks, inspired biological models for data acquisition and coding, and addressing domain shifts in prediction models. These considerations are crucial in further refining the VCM framework to adapt to the dynamic requirements of both human-centric and machine-centric applications.

In conclusion, VCM represents a significant evolution in video compression, harmonizing human and machine vision through collaborative approaches. It opens new avenues for integrated AI systems and efficient data processing methodologies, making it an exciting area for ongoing research and development.

PDF Markdown Bookmark Chat (Pro)

Authors (5)

Ling-Yu Duan (36 papers)
Jiaying Liu (99 papers)
Wenhan Yang (96 papers)
Tiejun Huang (130 papers)
Wen Gao (114 papers)

Citations (169)

View on Semantic Scholar