Joint Autoregressive and Hierarchical Priors for Learned Image Compression (1809.02736v1)

Published 8 Sep 2018 in cs.CV

Abstract: Recent models for learned image compression are based on autoencoders, learning approximately invertible mappings from pixels to a quantized latent representation. These are combined with an entropy model, a prior on the latent representation that can be used with standard arithmetic coding algorithms to yield a compressed bitstream. Recently, hierarchical entropy models have been introduced as a way to exploit more structure in the latents than simple fully factorized priors, improving compression performance while maintaining end-to-end optimization. Inspired by the success of autoregressive priors in probabilistic generative models, we examine autoregressive, hierarchical, as well as combined priors as alternatives, weighing their costs and benefits in the context of image compression. While it is well known that autoregressive models come with a significant computational penalty, we find that in terms of compression performance, autoregressive and hierarchical priors are complementary and, together, exploit the probabilistic structure in the latents better than all previous learned models. The combined model yields state-of-the-art rate--distortion performance, providing a 15.8% average reduction in file size over the previous state-of-the-art method based on deep learning, which corresponds to a 59.8% size reduction over JPEG, more than 35% reduction compared to WebP and JPEG2000, and bitstreams 8.4% smaller than BPG, the current state-of-the-art image codec. To the best of our knowledge, our model is the first learning-based method to outperform BPG on both PSNR and MS-SSIM distortion metrics.

PDF Abstract

Joint Autoregressive and Hierarchical Priors for Learned Image Compression: An Expert Overview

Introduction

In the domain of learned image compression, contemporary methods leverage autoencoders to transform pixel data into a quantized latent representation, subsequently employing entropy models for efficient encoding. The paper by Minnen et al. evaluates different priors—autoregressive, hierarchical, and their combination—within such frameworks. The paper meticulously analyzes the compression performance implications of these priors, presenting a robust comparison against state-of-the-art codecs.

Methodology and Models

Minnen et al. extend the established work utilizing Gaussian Scale Mixture (GSM) entropy models, which condition scale parameters on hyperpriors. The key extensions introduced involve:

Generalization to Gaussian Mixture Models (GMM): This model augments the GSM by predicting both mean and scale parameters conditioned on the hyperprior.
Incorporation of Autoregressive Components: Inspired by probabilistic generative models, an autoregressive model is added to further refine compression performance by leveraging causal context in the latents.

These extensions are embedded within an overarching autoencoder framework. The compression process is realized by optimizing a rate--distortion objective: $J = E_{\bm{x} \sim p_{\bm{x}}} [-\log_2 p_{\bm{\hat{y}}(\lfloor f(\bm{x}) \rfloor)] + \lambda E_{\bm{x} \sim p_{\bm{x}}} [d(\bm{x}, g(\lfloor f(\bm{x}) \rfloor))]$

Architecture

The proposed architecture is a combination of two sub-networks:

Autoencoder Network: Composed of an encoder and decoder that learn quantized latent representations.
Entropy Model Network: A blend of a hyperprior network and an autoregressive context model, integrated to predict mean and scale parameters for a conditional Gaussian entropy model.

The comprehensive system ensures that, post-training, the encoded bitstream encapsulates all necessary information for accurate image reconstruction by the decoder.

Experimental Results

Quantitative evaluations on diverse benchmarks—primarily the Kodak dataset—demonstrate the efficacy of the proposed method. The combined autoregressive and hierarchical model exhibits a significant improvement in rate--distortion performance:

File Size Reduction: Achieves a 15.8% average reduction compared to prior deep learning-based methods, culminating in a 59.8% size reduction over JPEG. The reduction rates compared to WebP, JPEG2000, and BPG are 35%, 35.5%, and 8.4%, respectively.
Image Quality Metrics: The combined model surpasses BPG in both PSNR and MS-SSIM metrics, a notable feat as BPG has traditionally been the benchmark codec.

Implications

The findings highlight the complementary nature of autoregressive and hyperprior models:

Theoretical: From an information-theoretic perspective, the autoregressive component can leverage spatial dependencies in image data, while the hyperprior provides a global context that further refines the entropy model.
Practical: Despite the computational overhead introduced by autoregressive models, the significant bit rate reduction and quality gains justify their inclusion. Future developments may focus on optimizing these models for computational efficiency without compromising performance.

Future Directions

Potential avenues for further research include exploring parallelization schemes for autoregressive models, such as integrating tighter coupling with arithmetic encoders. Additionally, introducing more sophisticated hierarchical models may further improve compression efficiency while maintaining theoretical elegance. Addressing real-time constraints and hardware acceleration could also enhance the practical applicability of these advanced priors in real-world applications.

Conclusion

Minnen et al.'s exploration into combined autoregressive and hierarchical priors marks a substantial step in learned image compression. The paper underscores the benefits of integrating both priors, achieving superior rate--distortion performance compared to traditional and contemporary codecs. This duality leverages the strengths of local and global dependencies, heralding a new paradigm in image compression research.

PDF Markdown Bookmark Chat (Pro)

Authors (3)

David Minnen (19 papers)
Johannes Ballé (29 papers)
George Toderici (22 papers)

Citations (1,130)

View on Semantic Scholar