Learning for Video Compression with Hierarchical Quality and Recurrent Enhancement (2003.01966v7)

Published 4 Mar 2020 in eess.IV and cs.CV

Abstract: In this paper, we propose a Hierarchical Learned Video Compression (HLVC) method with three hierarchical quality layers and a recurrent enhancement network. The frames in the first layer are compressed by an image compression method with the highest quality. Using these frames as references, we propose the Bi-Directional Deep Compression (BDDC) network to compress the second layer with relatively high quality. Then, the third layer frames are compressed with the lowest quality, by the proposed Single Motion Deep Compression (SMDC) network, which adopts a single motion map to estimate the motions of multiple frames, thus saving bits for motion information. In our deep decoder, we develop the Weighted Recurrent Quality Enhancement (WRQE) network, which takes both compressed frames and the bit stream as inputs. In the recurrent cell of WRQE, the memory and update signal are weighted by quality features to reasonably leverage multi-frame information for enhancement. In our HLVC approach, the hierarchical quality benefits the coding efficiency, since the high quality information facilitates the compression and enhancement of low quality frames at encoder and decoder sides, respectively. Finally, the experiments validate that our HLVC approach advances the state-of-the-art of deep video compression methods, and outperforms the "Low-Delay P (LDP) very fast" mode of x265 in terms of both PSNR and MS-SSIM. The project page is at https://github.com/RenYang-home/HLVC.

View on arXiv

Authors (4)

Ren Yang (25 papers)
Fabian Mentzer (19 papers)
Luc Van Gool (570 papers)
Radu Timofte (299 papers)

Citations (182)

View on Semantic Scholar

Summary

Overview of Hierarchical Learned Video Compression with Recurrent Enhancement

Recent developments in video streaming have emphasized the need for efficient video compression methodologies, especially considering the growing prevalence of high-resolution video content and its substantial contribution to mobile data traffic. Traditional video compression methods, notably H.264 and H.265, are based on handcrafted techniques that do not exploit the full potential of optimization across a data-driven framework. The paper "Learning for Video Compression with Hierarchical Quality and Recurrent Enhancement" introduces a novel approach to video compression utilizing deep learning, with a particular focus on hierarchical quality layering and recurrent enhancement networks.

Proposed Method: HLVC Framework

The paper presents the Hierarchical Learned Video Compression (HLVC) framework, designed with three progressive quality layers and incorporating a recurrent enhancement network, WRQE. Each quality layer is hierarchically structured to enhance coding efficiency by leveraging high-quality frames as references for both compression and enhancement purposes.

Layer 1 employs an image compression algorithm to encode frames with the highest quality, establishing key reference points similar to "I-frames" in traditional codecs. These frames are crucial for halting error propagation during encoding and decoding.
Layer 2 utilizes the Bi-Directional Deep Compression (BDDC) network. This network exploits bi-directional references from Layer 1 frames to compress intermediate frames at medium quality.
Layer 3 adopts the Single Motion Deep Compression (SMDC) network to encode frames with the lowest quality. The innovation here is the use of a single motion map to represent multiple frame motions, effectively reducing bit-rate commitments associated with motion information.

The recurrent enhancement is executed by the Weighted Recurrent Quality Enhancement (WRQE) network. WRQE leverages correlated multi-frame information, weighted by quality features extracted from the bitstream, for quality enhancement without additional bit-rate overhead.

Numerical Validation and Comparative Assessment

The HLVC framework was empirically validated against existing state-of-the-art methods, including x265's "Low-Delay P (LDP) very fast" mode. Through extensive experiments on datasets such as JCT-VC and UVG, the proposed HLVC method demonstrated superior rate-distortion performance. Specifically, it achieved notable reductions in bit-rate (up to 35.94% for MS-SSIM) compared to x265, even surpassing learned video compression algorithms like DVC and the method proposed by Habibian et al.

Implications and Future Directions

The HLVC framework represents a significant step forward in learned video compression, incorporating a dynamic adjustment of frame quality within a video stream and efficiently leveraging reference frames for recurrent enhancement. The implications for practical deployment are substantial, providing a pathway to more efficient video streaming solutions that maintain high visual quality while minimizing bandwidth usage.

The results suggest multiple avenues for future research. One promising direction involves automating the adaptation of the hierarchical structure to dynamically optimized compression tasks, potentially enhancing the applicability of HLVC in real-time settings. Further research could also investigate more advanced motion prediction and compensation models, building upon the motion estimation strategies leveraged within the SMDC network.

In conclusion, the presented HLVC framework not only sets a new benchmark for learned video compression but also opens new possibilities for enhancing video transmission efficiency at both theoretical and practical levels, making it a valuable contribution to ongoing advancements in video technology and deep learning applications.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - RenYang-home/HLVC (128 stars)

Tweets

https://twitter.com/EddieRYang/status/1279163059978809344