End-to-end Optimized Image Compression (1611.01704v3)

Published 5 Nov 2016 in cs.CV, cs.IT, and math.IT

Abstract: We describe an image compression method, consisting of a nonlinear analysis transformation, a uniform quantizer, and a nonlinear synthesis transformation. The transforms are constructed in three successive stages of convolutional linear filters and nonlinear activation functions. Unlike most convolutional neural networks, the joint nonlinearity is chosen to implement a form of local gain control, inspired by those used to model biological neurons. Using a variant of stochastic gradient descent, we jointly optimize the entire model for rate-distortion performance over a database of training images, introducing a continuous proxy for the discontinuous loss function arising from the quantizer. Under certain conditions, the relaxed loss function may be interpreted as the log likelihood of a generative model, as implemented by a variational autoencoder. Unlike these models, however, the compression model must operate at any given point along the rate-distortion curve, as specified by a trade-off parameter. Across an independent set of test images, we find that the optimized method generally exhibits better rate-distortion performance than the standard JPEG and JPEG 2000 compression methods. More importantly, we observe a dramatic improvement in visual quality for all images at all bit rates, which is supported by objective quality estimates using MS-SSIM.

PDF Abstract

End-to-end Optimized Image Compression

This paper presents a novel image compression method based on an end-to-end optimization framework that directly aligns with the rate-distortion criterion. The proposed method utilizes a cascade of nonlinear analysis, quantization, and synthesis transformations, differing significantly from traditional linear transform coding techniques such as JPEG and JPEG 2000.

Methodology and Key Components

The presented compression method involves a sequence of convolutional linear filters and nonlinear activation functions. The transforms are inherently nonlinear and are inspired by the local gain control mechanisms found in biological neurons. This design choice allows for a more adaptive and precise transformation from the image space to the latent code space and back.

Analysis and Synthesis Transforms

The aforementioned nonlinear analysis transform, denoted as $g_a$ , encodes the image into a latent code space. A uniform quantizer then discretizes the continuous values in this space. The synthesis transform, $g_s$ , decodes the quantized values back into the image space. Both transforms consist of multiple stages of convolution and subsampling operations, interspersed with the Generalized Divisive Normalization (GDN) nonlinearity and its approximate inverse (IGDN). The GDN/IGDN transforms importantly contribute to optimizing the rate-distortion performance by effectively Gaussianizing image densities.

Joint Optimization

The parameters of the analysis and synthesis transforms are jointly optimized using a variant of stochastic gradient descent. The optimization objective is a weighted sum of the rate and distortion metrics, $R + \lambda D$ , where $\lambda$ controls the trade-off between these two terms. The challenge posed by the zero gradients of the quantization process is mitigated by using a continuous proxy loss function. This relaxation allows the quantization step to be approximated by additive uniform noise, facilitating efficient gradient computation.

Experimental Validation and Results

The proposed model is trained and evaluated on subsets of widely recognized image datasets (e.g., ImageNet). Through extensive experimentation, the paper demonstrates that the proposed method consistently outperforms traditional JPEG and JPEG 2000 schemes in terms of rate-distortion performance, especially at lower bit rates. The visual quality of the reconstruct has improved substantially across various images and compression rates, validated by the Multi-Scale Structural Similarity (MS-SSIM) metric.

Key Findings

Rate-Distortion Performance: The method exhibits superior performance on the rate-distortion curve compared to JPEG and JPEG 2000. This is particularly evident at lower bit rates where the perceptual quality of the reconstructed images is markedly improved.
Visual Quality: Unlike JPEG and JPEG 2000, which produce visual artifacts such as blocking and ringing, the proposed method maintains smooth contours and edges more effectively.
Entropy Coding Efficiency: The implementation of an adaptive entropy code ensures that the resulting bit rate is close to the entropy of the representation, balancing the trade-off between data rate and distortion comprehensively.

Implications and Future Developments

This work has substantial implications both in the theoretical and practical domains of image compression:

Theoretical: The integration of end-to-end optimization techniques, inspired by biological neural systems, provides a novel perspective on designing image compression algorithms. This might influence future research on unsupervised learning and density estimation tasks.
Practical: The algorithm's efficiency and the significant improvement in visual quality suggest its potential applicability in environments where bandwidth and storage are at a premium, such as mobile and embedded systems.

Conclusion and Speculation on Future Developments

The proposed end-to-end optimized image compression framework signifies a meaningful advancement over traditional linear transform-based methods. Future research directions might include:

Extending to Video Compression: Adapting the architecture to temporal sequences and exploring how these transformations handle motion and temporal redundancies.
Alternative Nonlinearities: Exploring other forms of nonlinearities and activation functions to refine and potentially surpass the current rate-distortion performance.
Perceptual Metrics: Incorporating more advanced perceptual metrics beyond MSE during the optimization phase to further enhance the visual quality of the reconstructed images.

The alignment with variational autoencoders presents an interesting avenue for bridging model-inspired and data-driven approaches across various domains, encouraging further interdisciplinary research in machine learning and signal processing.

This work sets the stage for transformative advances in the field of image compression, poised to influence both academic research and practical applications significantly.