MLIC: Multi-Reference Entropy Model for Learned Image Compression (2211.07273v10)

Published 14 Nov 2022 in eess.IV and cs.CV

Abstract: Recently, learned image compression has achieved remarkable performance. The entropy model, which estimates the distribution of the latent representation, plays a crucial role in boosting rate-distortion performance. However, most entropy models only capture correlations in one dimension, while the latent representation contain channel-wise, local spatial, and global spatial correlations. To tackle this issue, we propose the Multi-Reference Entropy Model (MEM) and the advanced version, MEM$^+$. These models capture the different types of correlations present in latent representation. Specifically, We first divide the latent representation into slices. When decoding the current slice, we use previously decoded slices as context and employ the attention map of the previously decoded slice to predict global correlations in the current slice. To capture local contexts, we introduce two enhanced checkerboard context capturing techniques that avoids performance degradation. Based on MEM and MEM$^+$, we propose image compression models MLIC and MLIC$^+$. Extensive experimental evaluations demonstrate that our MLIC and MLIC$^+$ models achieve state-of-the-art performance, reducing BD-rate by $8.05\%$ and $11.39\%$ on the Kodak dataset compared to VTM-17.0 when measured in PSNR. Our code is available at https://github.com/JiangWeibeta/MLIC.

PDF Abstract

Multi-Reference Entropy Model for Learned Image Compression

The paper "MLIC: Multi-Reference Entropy Model for Learned Image Compression" introduces a novel approach to enhance the performance of learned image compression. It addresses the limitations of existing entropy models in capturing the multi-dimensional correlations present in latent image representations. The novel Multi-Reference Entropy Model (MEM) and its enhanced version MEM $^+$ are proposed to optimize the rate-distortion performance by capturing channel-wise, local spatial, and global spatial correlations more effectively.

Overview of the Proposed Methods

The primary innovation of this work lies in the development of MEM and MEM $^+$ , which utilize multiple reference points to provide a more accurate estimation of conditional entropy. The entropy model is critical in the context of learned image compression as it estimates the distribution of latent representations, influencing the efficiency of compression algorithms.

The authors first introduce the division of latent representations into slices when describing the MEM and MEM $^+$ models. Each slice is decoded using the context provided by previously decoded slices, leveraging channel-wise interactions. The local contexts are captured using enhanced checkerboard context capturing techniques, designed to prevent performance degradation typical of parallel decoding models. In addition, global correlations are predicted using attention maps derived from previously decoded slices, which facilitates the capturing of extended spatial information.

In practice, the paper presents implementations of these models as MLIC and MLIC $^+$ image compression systems. Experimental evaluations confirm their improvements over traditional coding methods like VVC and comparable performance with other advanced learned image compression models. Particularly noteworthy is the reduction in BD-rate by 8.05% and 11.39%, respectively, on the Kodak dataset when measured in PSNR.

Numerical Results and Performance Implications

The proposed models demonstrate state-of-the-art performance in terms of both rate-distortion and perceptual quality metrics across various datasets, including Kodak, Tecnick, and others. The substantial BD-rate reductions highlight the efficiency of MEM and MEM $^+$ in the learned image compression landscape.

By using a more complex entropy model that effectively captures correlations across multiple dimensions, MLIC and MLIC $^+$ achieve better compression ratios and image quality than traditional methods. This capability is particularly advantageous given the increasing demands for effective image compression amidst growing digital media consumption.

Theoretical and Practical Implications

From a theoretical standpoint, the research underscores the importance of context in entropy modeling and indicates potential areas for further exploration in multi-dimensional correlation capturing techniques. The proposed models facilitate an understanding of correlation dynamics in compressed image representations, which could guide future development in both image and video compression domains.

Practically, the proposed models offer viable alternatives and enhancements to existing compression frameworks, especially those reliant on learned methods. The modularity of MEM and MEM $^+$ lends itself to integration within different architectures, possibly aiding the development of adaptive compression systems tailored to specific application requirements.

Future Directions

The promising results showcased by the MLIC and MLIC $^+$ compression systems open several avenues for future research. For one, exploration into lighter-weight variants and optimization for real-time applications could address operational constraints posed by computationally intensive modules. Moreover, the integration of these models within broader multimedia processing pipelines could enhance system interoperability and efficiency.

In conclusion, the paper provides a significant contribution to the field of learned image compression, presenting a sophisticated approach to entropy modeling that advances both theoretical understanding and practical application capabilities.

PDF Markdown Bookmark Chat (Pro)

Authors (6)

Wei Jiang (343 papers)
Jiayu Yang (32 papers)
Yongqi Zhai (12 papers)
Peirong Ning (4 papers)
Feng Gao (240 papers)
Ronggang Wang (45 papers)

Citations (48)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - JiangWeibeta/MLIC: Multi-Reference Entropy Models for Learned Image Compression (72 stars)