Neural Video Compression with Diverse Contexts (2302.14402v3)

Published 28 Feb 2023 in eess.IV, cs.CV, and cs.MM

Abstract: For any video codecs, the coding efficiency highly relies on whether the current signal to be encoded can find the relevant contexts from the previous reconstructed signals. Traditional codec has verified more contexts bring substantial coding gain, but in a time-consuming manner. However, for the emerging neural video codec (NVC), its contexts are still limited, leading to low compression ratio. To boost NVC, this paper proposes increasing the context diversity in both temporal and spatial dimensions. First, we guide the model to learn hierarchical quality patterns across frames, which enriches long-term and yet high-quality temporal contexts. Furthermore, to tap the potential of optical flow-based coding framework, we introduce a group-based offset diversity where the cross-group interaction is proposed for better context mining. In addition, this paper also adopts a quadtree-based partition to increase spatial context diversity when encoding the latent representation in parallel. Experiments show that our codec obtains 23.5% bitrate saving over previous SOTA NVC. Better yet, our codec has surpassed the under-developing next generation traditional codec/ECM in both RGB and YUV420 colorspaces, in terms of PSNR. The codes are at https://github.com/microsoft/DCVC.

Citations (96)

View on Semantic Scholar

Summary

The paper introduces hierarchical quality patterns to enrich long-term temporal contexts, improving frame reconstruction and reducing error propagation.
The paper presents a group-based offset diversity mechanism that enhances temporal context mining through cross-group interactions to better capture complex motions.
The paper employs quadtree-based spatial partitioning for parallel entropy coding, achieving significant compression gains with a 23.5% bitrate saving over previous models.

Neural Video Compression with Diverse Contexts

The paper "Neural Video Compression with Diverse Contexts" addresses a prominent challenge in neural video codecs (NVCs) regarding context limitation, impacting compression efficiency. Unlike traditional codecs that utilize a wide range of contexts to minimize redundancy, NVCs often struggle with narrower context extraction techniques, leading to suboptimal compression ratios.

Key Contributions

The paper introduces several innovations to enhance context diversity in both temporal and spatial dimensions:

Hierarchical Quality Patterns: Inspired by traditional codecs' hierarchical quality structures, the paper proposes guiding neural models to learn these patterns across video frames. This approach enriches long-term, high-quality temporal contexts, which are crucial for better frame reconstruction and reducing error propagation.
Offset Diversity with Cross-Group Interaction: The authors design a group-based offset diversity mechanism. By introducing multiple offsets and enabling cross-group interactions, this method enhances temporal context mining. This strategy allows for more robust handling of complex motions and occlusions.
Quadtree-Based Spatial Partitioning: To diversify spatial contexts, the paper adopts a quadtree-based partitioning method for encoding latent representations. This approach enables parallel entropy coding with enhanced distribution estimation accuracy, rivaling auto-regressive models without incurring their computational burdens.

Performance and Implications

The proposed codec demonstrates significant improvements over state-of-the-art (SOTA) NVCs and even outperforms traditional codecs like ECM in both RGB and YUV420 colorspaces. Specifically, the paper reports a 23.5% bitrate saving over previous SOTA NVCs and surpasses ECM, marking a notable milestone in NVC development.

The experimental results bolster the codec's efficacy: it leverages temporal correlations and spatial redundancies more effectively, challenging the established benchmarks set by traditional codecs. This advancement is particularly important for applications demanding real-time processing and high compression efficiency, such as live streaming and video conferencing.

Speculation on Future Developments

The implications for future AI applications are multifold. With robust context exploitation, neural video codecs can aim for even greater efficiency, potentially setting new standards for video compression technology. Future research could explore further reducing computational complexities, refine context selection methodologies, and investigate adaptive quality settings customized per content type.

Overall, this paper represents a significant leap forward in enhancing the compression efficiency of neural video codecs by employing diverse context strategies. The improvements over both previous neural models and traditional coding methods suggest a promising trajectory for further integration of deep learning technologies in video compression.

PDF Markdown

Related Papers

GitHub

GitHub - microsoft/DCVC: Deep Contextual Video Compression (337 stars)

Tweets

https://twitter.com/_akhaliq/status/1630783215047770112