An End-to-End Compression Framework Based on Convolutional Neural Networks (1708.00838v1)

Published 2 Aug 2017 in cs.CV

Abstract: Deep learning, e.g., convolutional neural networks (CNNs), has achieved great success in image processing and computer vision especially in high level vision applications such as recognition and understanding. However, it is rarely used to solve low-level vision problems such as image compression studied in this paper. Here, we move forward a step and propose a novel compression framework based on CNNs. To achieve high-quality image compression at low bit rates, two CNNs are seamlessly integrated into an end-to-end compression framework. The first CNN, named compact convolutional neural network (ComCNN), learns an optimal compact representation from an input image, which preserves the structural information and is then encoded using an image codec (e.g., JPEG, JPEG2000 or BPG). The second CNN, named reconstruction convolutional neural network (RecCNN), is used to reconstruct the decoded image with high-quality in the decoding end. To make two CNNs effectively collaborate, we develop a unified end-to-end learning algorithm to simultaneously learn ComCNN and RecCNN, which facilitates the accurate reconstruction of the decoded image using RecCNN. Such a design also makes the proposed compression framework compatible with existing image coding standards. Experimental results validate that the proposed compression framework greatly outperforms several compression frameworks that use existing image coding standards with state-of-the-art deblocking or denoising post-processing methods.

Authors (6)

Feng Jiang (98 papers)
Wen Tao (4 papers)
Shaohui Liu (54 papers)
Jie Ren (329 papers)
Xun Guo (20 papers)
Debin Zhao (33 papers)

Citations (191)

View on Semantic Scholar

Summary

A Comprehensive Analysis of `An End-to-End Compression Framework Based on Convolutional Neural Networks`

This paper elucidates a pioneering compression framework utilizing convolutional neural networks (CNNs) to enhance image compression, tackling typically neglected low-level vision tasks such as image compression, unlike the more prevalent high-level vision tasks like image recognition and understanding. The framework integrates two CNNs—Compact Convolutional Neural Network (ComCNN) and Reconstruction Convolutional Neural Network (RecCNN)—aimed at retaining image quality while achieving reduced bit rates.

Framework Overview

The novel framework employs ComCNN to learn a compact image representation, which maintains structural intricacies crucial for effective image reconstruction, and encodes this representation using traditional image codecs like JPEG, JPEG2000, or BPG. RecCNN subsequently reconstructs the decoded image, achieving high image quality in the process. Both networks are trained concomitantly using a unified learning algorithm, adeptly tackling the challenge posed by the non-differentiable rounding function inherent in quantization phases of existing image codecs.

Experimental Results and Insights

Experimental evaluations underscore the effectiveness of the two-CNN framework. The proposed method substantially outperforms traditional image coding standards augmented with state-of-the-art deblocking or denoising methods. At a JPEG quality factor (QF) of 5, the framework yields an average PSNR gain of 1.20 dB and SSIM improvement of 0.0227 over the best traditional method. Results manifest similarly significant performance gains for JPEG2000 and BPG, notably achieving average PSNR improvements of 3.06 dB in certain datasets.

These results substantiate the efficacy of the proposed method in preserving high-frequency details and generating sharp-edge reconstructions. The framework's compatibility with existing codecs (JPEG, JPEG2000, BPG) ensures its applicability across diverse systems without requiring a complete overhaul of existing infrastructure.

Theoretical and Practical Implications

Theoretically, this work pushes the envelope in integrating deep learning approaches with classic image compression techniques, offering a cohesive model that accentuates the synergistic potential between CNNs and traditional codecs. Practically, it provides a scalable, efficient solution that can significantly enhance image quality in low bit rate transmissions—a crucial feature in bandwidth-constrained environments.

The success of the framework suggests potential avenues for further research in low-level vision tasks, including exploring deeper and more complex network architectures or hybrid models integrating additional machine learning methods. Furthermore, the optimization protocol developed to handle quantization's non-differentiability may apply to a broader scope of problems requiring differentiability resolution in deep learning frameworks.

Conclusion

This paper delivers a robust, adaptive compression framework leveraging the capabilities of CNNs to surpass current benchmarks in image quality preservation at low bit rates. The harmonious blend of compact representation learning and precise reconstruction positions it as a valuable asset in both academic explorations and real-world implementations where optimal image fidelity is essential under size constraints. Continued advancements in this domain can propel further innovations in compression technologies, deeply impacting the landscape of image processing and computational vision.

PDF Markdown