EVC: Towards Real-Time Neural Image Compression with Mask Decay
Overview
The paper presents Efficient Variable-bit-rate Codec (EVC), an innovative approach in neural image compression that balances state-of-the-art rate-distortion (RD) performance with real-time processing capabilities. Traditionally, neural codecs have outperformed codecs like H.266/VVC due to better RD metrics but have struggled with high computational complexity and the need for separate models tailored for different RD trade-offs. EVC addresses these challenges with a robust framework capable of achieving 30 frames per second (FPS) for input images of resolutions up to 1920x1080, exceeding the efficiency of existing neural codecs.
Key Contributions
- Single-Model Variable-Bit-Rate Handling: EVC introduces a methodology allowing a single model to adapt to various RD trade-offs. This is achieved by integrating adjustable quantization steps, eliminating the need for multiple models and addressing inefficiencies in existing neural codecs.
- Reduced Complexity: The architecture employs Depth-Conv blocks and spatial priors, which are optimized for GPU parallelization. The iterative reduction in complexity across both encoder and decoder modules ensures real-time processing without compromising RD performance.
- Mask Decay Mechanism: A unique mask decay method refines the transition from a large capacity model to a more compact model by efficiently transforming larger models' parameters. This approach allows for significant performance retention across varying model sizes, leading to a 50% and 30% performance improvement in medium and small models, respectively.
- Scalable Encoder: EVC promotes a scalable encoder approach, adjusting encoding complexities dynamically to cater to various latency requirements. This feature, bolstered through residual representation learning and mask decay, significantly narrows performance gaps typically observed in scalable models.
Experimental Results
The paper's experimental evaluation demonstrates EVC's superior performance across commonly used datasets such as Kodak, Tecnick, and HEVC test sequences. The proposed models not only maintained competitive RD performance comparable to state-of-the-art neural codecs but also outperformed traditional methods like VTM.
- Performance Metrics: EVC achieved up to 30 FPS for full HD images with a single model. The RD performance consistently outperformed H.266/VVC while remaining comparable to advanced neural models.
- Computational Efficiency: Latency evaluations on NVIDIA GPUs indicated substantial improvements over traditional models, demonstrating EVC's capability to deliver high throughput efficiently.
Implications and Future Work
EVC's development represents a significant step forward in making neural image compression viable for real-time applications, potentially revolutionizing fields like streaming, storage, and transmission where efficient data handling is critical.
The paper implies potential areas for future exploration, such as refining the decoder's redundancy, automatically determining optimal layer complexities within compression models, and enhancing pruning algorithms focused more on optimization processes rather than parameter importance. These avenues hold promise for further improving the efficiency and adaptability of neural compression techniques.
In conclusion, EVC establishes a practical, high-performance, and efficient image codec, paving the way for future research and applications in the dynamic field of neural image compression.