EVC: Towards Real-Time Neural Image Compression with Mask Decay (2302.05071v1)

Published 10 Feb 2023 in eess.IV, cs.CV, and cs.MM

Abstract: Neural image compression has surpassed state-of-the-art traditional codecs (H.266/VVC) for rate-distortion (RD) performance, but suffers from large complexity and separate models for different rate-distortion trade-offs. In this paper, we propose an Efficient single-model Variable-bit-rate Codec (EVC), which is able to run at 30 FPS with 768x512 input images and still outperforms VVC for the RD performance. By further reducing both encoder and decoder complexities, our small model even achieves 30 FPS with 1920x1080 input images. To bridge the performance gap between our different capacities models, we meticulously design the mask decay, which transforms the large model's parameters into the small model automatically. And a novel sparsity regularization loss is proposed to mitigate shortcomings of $L_p$ regularization. Our algorithm significantly narrows the performance gap by 50% and 30% for our medium and small models, respectively. At last, we advocate the scalable encoder for neural image compression. The encoding complexity is dynamic to meet different latency requirements. We propose decaying the large encoder multiple times to reduce the residual representation progressively. Both mask decay and residual representation learning greatly improve the RD performance of our scalable encoder. Our code is at https://github.com/microsoft/DCVC.

Authors (4)

Guo-Hua Wang (11 papers)
Jiahao Li (80 papers)
Bin Li (514 papers)
Yan Lu (179 papers)

Citations (46)

View on Semantic Scholar

Summary

EVC: Towards Real-Time Neural Image Compression with Mask Decay

Overview

The paper presents Efficient Variable-bit-rate Codec (EVC), an innovative approach in neural image compression that balances state-of-the-art rate-distortion (RD) performance with real-time processing capabilities. Traditionally, neural codecs have outperformed codecs like H.266/VVC due to better RD metrics but have struggled with high computational complexity and the need for separate models tailored for different RD trade-offs. EVC addresses these challenges with a robust framework capable of achieving 30 frames per second (FPS) for input images of resolutions up to 1920x1080, exceeding the efficiency of existing neural codecs.

Key Contributions

Single-Model Variable-Bit-Rate Handling: EVC introduces a methodology allowing a single model to adapt to various RD trade-offs. This is achieved by integrating adjustable quantization steps, eliminating the need for multiple models and addressing inefficiencies in existing neural codecs.
Reduced Complexity: The architecture employs Depth-Conv blocks and spatial priors, which are optimized for GPU parallelization. The iterative reduction in complexity across both encoder and decoder modules ensures real-time processing without compromising RD performance.
Mask Decay Mechanism: A unique mask decay method refines the transition from a large capacity model to a more compact model by efficiently transforming larger models' parameters. This approach allows for significant performance retention across varying model sizes, leading to a 50% and 30% performance improvement in medium and small models, respectively.
Scalable Encoder: EVC promotes a scalable encoder approach, adjusting encoding complexities dynamically to cater to various latency requirements. This feature, bolstered through residual representation learning and mask decay, significantly narrows performance gaps typically observed in scalable models.

Experimental Results

The paper's experimental evaluation demonstrates EVC's superior performance across commonly used datasets such as Kodak, Tecnick, and HEVC test sequences. The proposed models not only maintained competitive RD performance comparable to state-of-the-art neural codecs but also outperformed traditional methods like VTM.

Performance Metrics: EVC achieved up to 30 FPS for full HD images with a single model. The RD performance consistently outperformed H.266/VVC while remaining comparable to advanced neural models.
Computational Efficiency: Latency evaluations on NVIDIA GPUs indicated substantial improvements over traditional models, demonstrating EVC's capability to deliver high throughput efficiently.

Implications and Future Work

EVC's development represents a significant step forward in making neural image compression viable for real-time applications, potentially revolutionizing fields like streaming, storage, and transmission where efficient data handling is critical.

The paper implies potential areas for future exploration, such as refining the decoder's redundancy, automatically determining optimal layer complexities within compression models, and enhancing pruning algorithms focused more on optimization processes rather than parameter importance. These avenues hold promise for further improving the efficiency and adaptability of neural compression techniques.

In conclusion, EVC establishes a practical, high-performance, and efficient image codec, paving the way for future research and applications in the dynamic field of neural image compression.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - microsoft/DCVC: Deep Contextual Video Compression (403 stars)