Neural Image Compression via Non-Local Attention Optimization and Improved Context Modeling (1910.06244v1)

Published 11 Oct 2019 in eess.IV

Abstract: This paper proposes a novel Non-Local Attention optmization and Improved Context modeling-based image compression (NLAIC) algorithm, which is built on top of the deep nerual network (DNN)-based variational auto-encoder (VAE) structure. Our NLAIC 1) embeds non-local network operations as non-linear transforms in the encoders and decoders for both the image and the latent representation probability information (known as hyperprior) to capture both local and global correlations, 2) applies attention mechanism to generate masks that are used to weigh the features, which implicitly adapt bit allocation for feature elements based on their importance, and 3) implements the improved conditional entropy modeling of latent features using joint 3D convolutional neural network (CNN)-based autoregressive contexts and hyperpriors. Towards the practical application, additional enhancements are also introduced to speed up processing (e.g., parallel 3D CNN-based context prediction), reduce memory consumption (e.g., sparse non-local processing) and alleviate the implementation complexity (e.g., unified model for variable rates without re-training). The proposed model outperforms existing methods on Kodak and CLIC datasets with the state-of-the-art compression efficiency reported, including learned and conventional (e.g., BPG, JPEG2000, JPEG) image compression methods, for both PSNR and MS-SSIM distortion metrics.

PDF Abstract

Overview of "Neural Image Compression via Non-Local Attention Optimization and Improved Context Modeling"

The paper introduces a novel image compression algorithm dubbed NLAIC, which stands for Non-Local Attention Optimization and Improved Context Modeling. This approach leverages the architecture of a deep neural network with a variational auto-encoder (VAE) framework. The NLAIC algorithm is designed to outperform existing methodologies in terms of compression efficiency, by capturing both local and global correlations through non-linear transformations facilitated by non-local network operations in both encoder and decoder pathways.

Key Contributions

Non-Local Network Operations: The method introduces non-local network operations which function as non-linear transforms to capture spatial correlations beyond immediate proximity, both in the original image and the latent space representation known as hyperpriores.
Attention Mechanism: An attention mechanism is employed to automatically generate masks that guide bit allocation according to the importance of image features. This means higher bit allocation for areas deemed significant, translating into improved rate-distortion efficiency.
Improved Context Modeling: The compression algorithm capitalizes on a joint 3D convolutional neural network (CNN) model to enhance context modeling when predicting conditional entropy. This incorporates autoregressive contexts as well as hyperpriors, facilitating efficient compression by precise capturing of latent feature statistics.
Implementation Practicalities: The paper discusses several enhancements for practical applications, such as parallel processing for context predictions using 3D CNNs, sparse non-local operations for reduced memory usage, and a unified model for various bitrate requirements without retraining.

Performance and Implications

The proposed NLAIC algorithm has shown superior performance when tested against leading conventional and learned image compression methods, achieving state-of-the-art efficiency on benchmark datasets like Kodak and CLIC. Performance evaluations using both PSNR and MS-SSIM metrics further validate the claim, with NLAIC demonstrating significant BD-Rate savings relative to standard compression techniques like JPEG and BPG.

Theoretical and Practical Implications

The paper establishes the potential for significant advancements in image compression codecs by incorporating non-local operations and adaptive, non-explicit attention-based models. This method aligns well with the needs of modern image communication, where quality preservation and bit efficiency are paramount. For future research, such modeling frameworks may serve as groundwork for developing video compression solutions, where temporal correlations and scalability in model complexity become additional factors.

Future Directions

While the current focus is on image compression, the advancement opens the door to research in video compression, inviting exploration into integrating temporal context and optimizing for low-resource devices. Furthermore, the investigation into better alignment between quality metrics like MS-SSIM and perceptual quality assurance is crucial for comprehensive media encoding solutions.

In conclusion, the NLAIC algorithm notably enhances efficiency in image compression through its innovative combination of non-local attention-based optimization and advanced context modeling, suggesting pathways for even broader applications in media compression technologies.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Tong Chen (200 papers)
Haojie Liu (20 papers)
Zhan Ma (91 papers)
Qiu Shen (25 papers)
Xun Cao (77 papers)
Yao Wang (331 papers)

Citations (218)

View on Semantic Scholar