Variable Rate Image Compression with Recurrent Neural Networks (1511.06085v5)

Published 19 Nov 2015 in cs.CV, cs.LG, and cs.NE

Abstract: A large fraction of Internet traffic is now driven by requests from mobile devices with relatively small screens and often stringent bandwidth requirements. Due to these factors, it has become the norm for modern graphics-heavy websites to transmit low-resolution, low-bytecount image previews (thumbnails) as part of the initial page load process to improve apparent page responsiveness. Increasing thumbnail compression beyond the capabilities of existing codecs is therefore a current research focus, as any byte savings will significantly enhance the experience of mobile device users. Toward this end, we propose a general framework for variable-rate image compression and a novel architecture based on convolutional and deconvolutional LSTM recurrent networks. Our models address the main issues that have prevented autoencoder neural networks from competing with existing image compression algorithms: (1) our networks only need to be trained once (not per-image), regardless of input image dimensions and the desired compression rate; (2) our networks are progressive, meaning that the more bits are sent, the more accurate the image reconstruction; and (3) the proposed architecture is at least as efficient as a standard purpose-trained autoencoder for a given number of bits. On a large-scale benchmark of 32$\times$32 thumbnails, our LSTM-based approaches provide better visual quality than (headerless) JPEG, JPEG2000 and WebP, with a storage size that is reduced by 10% or more.

Authors (8)

George Toderici (22 papers)
Sean M. O'Malley (1 paper)
Sung Jin Hwang (10 papers)
Damien Vincent (25 papers)
David Minnen (19 papers)
Shumeet Baluja (15 papers)
Michele Covell (12 papers)
Rahul Sukthankar (39 papers)

Citations (640)

View on Semantic Scholar

Summary

Variable Rate Image Compression with Recurrent Neural Networks

The paper presents a novel approach to image compression utilizing Recurrent Neural Networks (RNNs). The authors propose an architecture that accommodates variable-rate image compression, addressing specific limitations of traditional codecs like JPEG, JPEG2000, and WebP. These established codecs are typically constrained by non-progressive compression techniques and inefficiencies at lower resolutions, especially for thumbnails.

Technical Contributions

The work introduces a framework incorporating both convolutional and deconvolutional LSTM-based networks to optimize image compression. The architecture is characterized by several key innovations:

Single Training Phase: The proposed networks are trained once and can accommodate different image dimensions and compression rates without retraining.
Progressive Encoding: The networks can progressively enhance the visual quality of images by incrementally sending more bits.
Efficiency and Flexibility: The architecture can achieve compression rates comparable to or better than traditional methods, with significant reductions in storage size.

The architectures employ a combination of fully-connected, convolutional, and deconvolutional neural network layers. They leverage LSTM units to maintain state, thus improving the efficiency of residual error predictions across compression iterations.

Numerical Results and Claims

The research benchmarks the proposed methods against JPEG, JPEG2000, and WebP using SSIM as the metric for image quality. Some highlights include:

The LSTM-based models outperform JPEG and WebP on 32x32 image benchmarks by providing better visual quality at reduced storage sizes, up to 12% lower average bitrate for comparable quality.
The approach surpasses headerless JPEG and JPEG2000 in SSIM scores across targeted storage sizes of 64 and 128 bytes for thumbnails.

These results underline the potential of the architectures to replace traditional codecs in scenarios where progressive and flexible compression is beneficial.

Implications and Future Directions

The implications of this work extend to improving the efficiency of image transmission over networks, particularly benefiting mobile platforms where bandwidth may be limited. The ability to dynamically adjust bit allocation without retraining further enhances practical utility.

Future research directions might focus on extending these techniques to handle higher resolution images while maintaining or enhancing the compression efficiency through entropy coding. The concept may also be explored within video compression domains, aligning with the growing demand for highly efficient video transmission.

Moreover, the paper hints at the need for improved dynamic bit assignment algorithms that can mitigate artifacts and optimize the allocation of bits across patches in spatial contexts. Achieving this would enhance the applicability of these methods across broader use cases.

In summary, the work presented combines deep learning and image processing expertise to challenge existing paradigms in image compression, introducing methods that potentially offer significant enhancements in quality and efficiency for small-scale images and potentially broader scenarios.

PDF Markdown

Related Papers

Find Related Papers