COIN: COmpression with Implicit Neural representations (2103.03123v2)

Published 3 Mar 2021 in eess.IV, cs.CV, and cs.LG

Abstract: We propose a new simple approach for image compression: instead of storing the RGB values for each pixel of an image, we store the weights of a neural network overfitted to the image. Specifically, to encode an image, we fit it with an MLP which maps pixel locations to RGB values. We then quantize and store the weights of this MLP as a code for the image. To decode the image, we simply evaluate the MLP at every pixel location. We found that this simple approach outperforms JPEG at low bit-rates, even without entropy coding or learning a distribution over weights. While our framework is not yet competitive with state of the art compression methods, we show that it has various attractive properties which could make it a viable alternative to other neural data compression approaches.

Citations (196)

View on Semantic Scholar

Summary

The paper presents a novel compression method using an overfitted MLP to directly map pixel coordinates to RGB values.
It employs sinusoidal activation functions (SIRENs) to retain high-frequency details, showing improved PSNR over JPEG at 0.3 bpp.
The approach transforms image compression into a model compression problem, opening new research avenues in implicit neural representations.

An Analysis of COIN: COmpression with Implicit Neural Representations

This paper offers a distinctive approach to image compression through the utilization of implicit neural representations. Traditional neural image compression methods usually involve an autoencoder architecture. In this approach, an encoder generates a latent code from the input image, involving entropy coding, before the image can be reconstructed. This paper diverges considerably by proposing a novel technique wherein an image is compressed using an MLP (Multi-Layer Perceptron) overfitted to compute the RGB values for given pixel locations. The methodology encodes the image by overfitting an MLP with pixel locations as inputs, and outputs that approximate the RGB values of the image. The overfitted MLP's weights are stored and transmitted as the compressed image code.

The authors focus on employing MLPs with sinusoidal activation functions known as SIRENs, which have shown promise in preserving high-frequency information critical in image representation. This method marks a transition from the norm as it directly uses optimization of MLPs to represent an image rather than learning an encoder-decoder pair. One of the notable results is that COIN demonstrates better performance compared to JPEG in terms of PSNR (Peak Signal-to-Noise Ratio) at low bit-rates, a significant achievement given that COIN dispenses with sophisticated entropy coding schemes.

Methodology

In COIN, the image compression task is reformulated as a model compression problem. The objective is to fit the fewest parameters possible to minimize a distortion measure, the mean squared error in this case. The parameterization of the MLP is crucial as it determines the network’s capability to approximate the image efficiently. The research suggests architecture search and weight quantization to achieve suitable model sizes, balancing the trade-off between compression rate and quality.

Numerical Results

The evaluation on the Kodak dataset showcases COIN's competitive performance against traditional codecs like JPEG, especially at lower bit-rates. Specifically, at 0.3 bits-per-pixel (bpp), COIN demonstrates a smaller model size requirement compared to typical neural models, which result in significant reductions in memory requirements on the decoding device. This aspect is particularly compelling for environments with limited resources. However, COIN does fall short when compared to the best contemporary methods in terms of compression efficiency.

Implications and Future Directions

While COIN currently doesn't surpass the most advanced compression methods, its architectural simplicity and the innovative conversion of image compression to model compression could incite profound interest in future research directions. Potential enhancements could involve implementing learned distributions over the network weights, evolving model complexity through advanced architecture search methods, and leveraging meta-learning for quick adaptation to new images. The ability to decouple the decoder from the complexity seen in standard methods is a strategic advantage, illustrating the growing potential of implicit models beyond image compression, perhaps extending towards video and audio data domains.

Researchers are encouraged to explore optimizing the encoding process, possibly unforeseen advanced quantization techniques and generative modeling frameworks. COIN’s transformation of compression problems into model compression exercises exemplifies a thought-provoking shift in how machine learning can be utilized for data efficiency, representing a novel addition to the ongoing advancement in neural data compression strategies.

PDF Markdown

Related Papers

YouTube

Show All Videos