DVC: An End-to-end Deep Video Compression Framework (1812.00101v3)

Published 30 Nov 2018 in eess.IV and cs.CV

Abstract: Conventional video compression approaches use the predictive coding architecture and encode the corresponding motion information and residual information. In this paper, taking advantage of both classical architecture in the conventional video compression method and the powerful non-linear representation ability of neural networks, we propose the first end-to-end video compression deep model that jointly optimizes all the components for video compression. Specifically, learning based optical flow estimation is utilized to obtain the motion information and reconstruct the current frames. Then we employ two auto-encoder style neural networks to compress the corresponding motion and residual information. All the modules are jointly learned through a single loss function, in which they collaborate with each other by considering the trade-off between reducing the number of compression bits and improving quality of the decoded video. Experimental results show that the proposed approach can outperform the widely used video coding standard H.264 in terms of PSNR and be even on par with the latest standard H.265 in terms of MS-SSIM. Code is released at https://github.com/GuoLusjtu/DVC.

Authors (6)

Guo Lu (39 papers)
Wanli Ouyang (358 papers)
Dong Xu (167 papers)
Xiaoyun Zhang (35 papers)
Chunlei Cai (7 papers)
Zhiyong Gao (17 papers)

Citations (581)

View on Semantic Scholar

Summary

Overview of "DVC: An End-to-end Deep Video Compression Framework"

The paper presents a novel approach to video compression, leveraging deep learning techniques to enhance the efficiency and quality of video coding. This end-to-end deep video compression (DVC) framework integrates several components traditionally handled separately, achieving superior performance by jointly optimizing each part through deep neural networks.

Core Contributions and Methodology

The proposed DVC framework addresses a significant limitation of conventional video compression systems, which rely largely on hand-crafted modules such as block-based motion estimation and discrete cosine transform (DCT). Instead, the DVC model fully integrates motion estimation, compensation, and residual compression into a cohesive deep learning framework. This integration is facilitated by a single loss function that balances the trade-off between bit rate reduction and video quality.

Motion Estimation and Compression: The DVC employs a CNN-based method for estimating optical flow, which serves as a form of motion information. A novel encoder-decoder network compresses this information, optimizing it for minimal bit usage while maintaining fidelity.
Motion Compensation: The paper introduces a motion compensation network that refines the predicted video frames, further reducing artifacts associated with motion prediction.
Residual Encoder-Decoder: A highly non-linear residual encoder-decoder network replaces the traditional linear transforms, allowing better handling of residual errors.
Joint Optimization: By optimizing these components together, the framework exploits the end-to-end training capability of neural networks, offering significant improvements in compression performance over traditional video codecs.

Experimental Results

The authors provide comprehensive experimental results demonstrating the efficacy of the DVC framework. The model outperforms the widely used H.264 video coding standard in terms of PSNR and is competitive with the newer H.265 standard when evaluated using the MS-SSIM metric. These results underscore the model's ability to achieve high compression rates while preserving visual quality.

DVC achieves a significant reduction in bit rate compared to H.264, with substantial improvements in PSNR and MS-SSIM, illustrating its capability to maintain high perceptual quality.
By employing end-to-end optimization, the model adapts the motion estimation process to be highly compressible without losing useful information, contrasting with traditional methods that separately optimize each component.

Theoretical and Practical Implications

The DVC model stands as a robust alternative to traditional video compression algorithms, showcasing the potential of deep learning techniques in domains dominated by manual optimizations. Its ability to replace classical methods with a fully neural approach signifies a shift towards more adaptable and potentially more powerful solutions in video processing.

For practitioners, the DVC framework provides a foundation for further exploration into enhancing video compression techniques, such as incorporating advanced neural architecture search or integrating with other deep learning-based video analysis tasks.

Future Prospects and Considerations

The paper opens several avenues for future exploration. One potential research direction involves integrating bi-directional prediction methods or exploring alternate loss functions better suited for perceptual quality optimization. Additionally, improvements in model efficiency and speed are crucial for practical deployment.

The release of the DVC code further invites collaboration and potential enhancements as researchers incorporate this framework into broader applications, from video streaming services to on-device video processing systems.

In summary, the paper presents a comprehensive, deep learning-based solution to video compression, offering significant improvements over conventional methods and providing a solid groundwork for future innovations in the field of video processing and compression.

PDF Markdown

Related Papers

GitHub

GitHub - GuoLusjtu/DVC: DVC: An End-to-end Deep Video Compression Framework, CVPR 2019 (Oral) (415 stars)