Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules (2001.01568v3)

Published 6 Jan 2020 in eess.IV

Abstract: Image compression is a fundamental research field and many well-known compression standards have been developed for many decades. Recently, learned compression methods exhibit a fast development trend with promising results. However, there is still a performance gap between learned compression algorithms and reigning compression standards, especially in terms of widely used PSNR metric. In this paper, we explore the remaining redundancy of recent learned compression algorithms. We have found accurate entropy models for rate estimation largely affect the optimization of network parameters and thus affect the rate-distortion performance. Therefore, in this paper, we propose to use discretized Gaussian Mixture Likelihoods to parameterize the distributions of latent codes, which can achieve a more accurate and flexible entropy model. Besides, we take advantage of recent attention modules and incorporate them into network architecture to enhance the performance. Experimental results demonstrate our proposed method achieves a state-of-the-art performance compared to existing learned compression methods on both Kodak and high-resolution datasets. To our knowledge our approach is the first work to achieve comparable performance with latest compression standard Versatile Video Coding (VVC) regarding PSNR. More importantly, our approach generates more visually pleasant results when optimized by MS-SSIM. This project page is at this https URL https://github.com/ZhengxueCheng/Learned-Image-Compression-with-GMM-and-Attention

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Zhengxue Cheng (29 papers)
  2. Heming Sun (39 papers)
  3. Masaru Takeuchi (9 papers)
  4. Jiro Katto (36 papers)
Citations (778)

Summary

Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules

The paper "Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules" puts forward a novel approach to image compression, leveraging discretized Gaussian mixture likelihoods and attention modules to enhance performance. This paper addresses the persistent performance gap between traditional compression standards like Versatile Video Coding (VVC) and novel deep learning-based methods, as measured by widely used metrics such as Peak Signal-to-Noise Ratio (PSNR).

Introduction and Background

Image compression is pivotal in efficient image transmission and storage, with established standards such as JPEG, JPEG2000, and HEVC/H.265 underpinning much of current technology. Despite the historical success of these standards, rapid advancements in high-resolution imaging and mobile device proliferation highlight their limitations. The traditional approaches rely significantly on hand-crafted modules to minimize spatial redundancy and optimize coding efficiency. These techniques include discrete cosine transform, quantization, and context-adaptive arithmetic coding, among others.

In contrast, learned image compression methods using neural networks have demonstrated remarkable promise. Predominantly, these methods employ autoencoder architectures for end-to-end optimization. The literature acknowledges important milestones such as hyperprior models and autoregressive contexts to improve entropy estimation. Nevertheless, there remains a notable gap in the performance when compared to standardized codecs.

Methodology

This research identifies that accurate entropy models are crucial for optimizing neural network parameters, directly impacting rate-distortion performance. The proposition is to model the distribution of latent representations using discretized Gaussian mixture likelihoods. This approach offers greater flexibility and accuracy in entropy modeling.

Entropy Model

The essence of the proposed method lies in the parameterization of latent code distributions using discretized Gaussian mixtures. The current state-of-the-art methods utilize single Gaussian models for entropy estimation; however, these are less effective in capturing the true distribution of latent codes due to their restrictive, fixed shapes. By incorporating multiple Gaussian components in the form of a Gaussian Mixture Model (GMM), the method captures a broader range of distribution shapes, thereby reducing spatial redundancy more effectively.

Attention Modules

Incorporating attention mechanisms into the network architecture is another crucial enhancement. Attention modules enable the network to focus on complex regions, improving the coding efficiency without disproportionately increasing training complexity. The implementation adopts a simplified attention mechanism to balance performance and computational load.

Experimental Validation

The proposed method was rigorously evaluated on both the Kodak dataset and the high-resolution CVPR Workshop and Challenge on Learned Image Compression (CLIC) dataset. Experimental results cemented the method's superiority over existing learned compression techniques and traditional codecs like JPEG, JPEG2000, and HEVC.

Performance Metrics

  1. PSNR: The proposed method demonstrated PSNR values comparable to VVC, a significant achievement considering the latter represents the pinnacle of current compression technology.
  2. MS-SSIM: For models optimized using MS-SSIM, the proposed method not only achieved state-of-the-art results but also produced more visually appealing images. The enhanced perceptual quality underscores the practical effectiveness of including attention modules.

Implications and Future Directions

This paper's findings hold substantial implications for the image compression domain. The proposed approach not only narrows the performance gap with established standards but also elevates the qualitative experience of compressed images, which could be transformative for applications requiring both efficiency and high perceptual quality.

Future research could build on this work by exploring:

  • Adaptive Entropy Models: Refining entropy models to dynamically adjust to various content types in real-time scenarios.
  • Hybrid Codec Designs: Integrating the proposed neural network-based methods with traditional hand-crafted modules for a hybrid approach, potentially leading to even greater performances.
  • Expanding Attention Mechanisms: Investigating more sophisticated attention mechanisms to further enhance the network's ability to focus on relevant regions, thereby improving compression efficiency and quality.

Conclusion

The paper presents a significant advancement in learned image compression by proposing a method that leverages discretized Gaussian mixture likelihoods and simplified attention modules. The approach not only achieves competitive performance with state-of-the-art codecs like VVC in terms of PSNR but also exceeds in producing high visual quality images when optimized using MS-SSIM. This achievement marks an important step forward in the practical deployment of deep learning-based image compression techniques.