Deep Convolutional AutoEncoder-based Lossy Image Compression (1804.09535v1)

Published 25 Apr 2018 in cs.CV

Abstract: Image compression has been investigated as a fundamental research topic for many decades. Recently, deep learning has achieved great success in many computer vision tasks, and is gradually being used in image compression. In this paper, we present a lossy image compression architecture, which utilizes the advantages of convolutional autoencoder (CAE) to achieve a high coding efficiency. First, we design a novel CAE architecture to replace the conventional transforms and train this CAE using a rate-distortion loss function. Second, to generate a more energy-compact representation, we utilize the principal components analysis (PCA) to rotate the feature maps produced by the CAE, and then apply the quantization and entropy coder to generate the codes. Experimental results demonstrate that our method outperforms traditional image coding algorithms, by achieving a 13.7% BD-rate decrement on the Kodak database images compared to JPEG2000. Besides, our method maintains a moderate complexity similar to JPEG2000.

View on arXiv

Authors (4)

Zhengxue Cheng (29 papers)
Heming Sun (39 papers)
Masaru Takeuchi (9 papers)
Jiro Katto (36 papers)

Citations (167)

View on Semantic Scholar

Summary

Deep Convolutional AutoEncoder-based Lossy Image Compression

The paper "Deep Convolutional AutoEncoder-based Lossy Image Compression" authored by Zhengxue Cheng et al., presents a novel approach for image compression, leveraging the strengths of deep learning methodologies specifically through convolutional autoencoders (CAE). Over decades, traditional image compression techniques like JPEG and JPEG2000 have dominated the digital landscape with fixed transform matrices such as DCT and wavelet transforms. These old algorithms, while effective, have limitations in adaptability and optimization, particularly confronted with diverse image contents and formats.

The presented approach employs a deep learning architecture tailored for lossy image compression, optimizing coding efficiency. The core advancements are twofold: firstly, a symmetric CAE architecture replaces conventional transforms, focusing on the generation of low-dimensional feature maps via multiple downsampling and upsampling units. The CAE is trained using an approximated rate-distortion loss function, ensuring a balance between compression rate and image fidelity.

Secondly, the paper introduces a principal components analysis (PCA) based rotation to enhance the energy compaction of feature maps. By transforming these feature maps, the process achieves a substantial creation of zeros, facilitating further sophisticated quantization and entropy coding. The PCA technique systematically decorrelates the data inputs, optimizing the compression efficiency without compromising computational complexity akin to JPEG2000. Experimental results affirm the competitiveness of the proposed method, demonstrating a 13.7% decrement in BD-rate against JPEG2000 using the Kodak database images. This suggests enhanced performance with comparable computational demands to established codecs.

In dissecting the technical constructs, the CAE leverages the Parametric Rectified Linear Unit (PReLU) for activation, boosting reconstruction quality, especially at higher bit rates. Moreover, the layered architecture supports dimension reduction and transformation symmetrically, promising more generalized application across varied media formats. This points to a potential paradigm shift in media handling: deep learning methods offer accelerated adaptation cycles compared to the historically lengthy codec standardization.

The implications of this research span both theoretical and practical horizons. Theoretically, it reinforces the capability of neural networks in diminishing dimensional burdens while maintaining stringent performance metrics in image quality. Practically, the fusion of autoencoders and PCA in compression protocols offers an avenue for more efficient coding in emerging multimedia formats like VR and 360-degree media, advocating a swift transition to advanced content handling.

Future Developments are speculated to revolve around further integration of perceptual quality matrices into CAE training processes, potentially enhancing metrics like MS-SSIM. Moreover, the exploration of generative adversarial networks (GANs) might present additional improvements in compression efficiency, enhancing the fidelity and compression trade-offs even further.

In summarizing, the proposed CAE-based system substantiates a strong trajectory toward progressive image codec designs, leveraging the adaptability and precision of deep learning constructs over traditional hand-crafted methods. The innovation lays groundwork for subsequent investigations on deep learning's role in high-efficiency media coding, establishing a scholarly pivot point for future discourse and development within AI-driven image processing frameworks.

Related Papers

Find Related Papers