Deep Koalarization: Image Colorization using CNNs and Inception-ResNet-v2 (1712.03400v1)

Published 9 Dec 2017 in cs.CV

Abstract: We review some of the most recent approaches to colorize gray-scale images using deep learning methods. Inspired by these, we propose a model which combines a deep Convolutional Neural Network trained from scratch with high-level features extracted from the Inception-ResNet-v2 pre-trained model. Thanks to its fully convolutional architecture, our encoder-decoder model can process images of any size and aspect ratio. Other than presenting the training results, we assess the "public acceptance" of the generated images by means of a user study. Finally, we present a carousel of applications on different types of images, such as historical photographs.

Citations (100)

View on Semantic Scholar

Summary

The paper presents a deep learning approach for grayscale image colorization using a CNN architecture fused with high-level features from Inception-ResNet-v2 in the CIE L*a*b* color space.
The model shows promise in coloring high-level elements like vegetation, achieving near-realism on some images, but struggles with specific details and produces conservative outputs due to dataset size.
Based on a user study indicating promising perceived realism, the approach offers potential for historical image restoration, though achieving true realism requires larger datasets and probabilistic models.

Deep Koalarization: Image Colorization using Convolutional Neural Networks and Inception-ResNet-v2

The paper entitled "Deep Koalarization: Image Colorization using CNNs and Inception-Resnet-v2," developed by Federico Baldassarre, Diego Gonzalez Morin, and Lucas Rodes-Guirao at the KTH Royal Institute of Technology, explores the automation of image colorization using deep learning methodologies. The authors propose a convolutional neural network-based approach that combines a deep CNN architecture with high-level features extracted from the pre-trained Inception-ResNet-v2 model. This integration seeks to enhance the colorization of gray-scale images, a complex task with applications ranging from historical image restoration to surveillance footage enhancement.

Methodology

The proposed model operates within the CIE L*a*b* color space, crucial for dissecting color information from luminance features. This segmentation allows the model to preserve high levels of detail in the reconstructed images. The architecture comprises four main components: an encoder, a feature extractor using Inception-ResNet-v2, a fusion layer, and a decoder.

Encoder: Processes input gray-scale images to produce a feature representation using a series of convolutional layers.
Feature Extractor: Utilizes the Inception-ResNet-v2 model to obtain high-level image embeddings, facilitating the understanding of semantic content.
Fusion: Combines the encoder's feature volume with replicated embeddings to ensure semantics are uniformly distributed spatially across the image.
Decoder: Finally assembles the estimated a*b* color components from the fused feature volume.

The training objective employs Mean Square Error to gauge the disparity between predicted color components and their actual values, optimizing model parameters using the Adam optimizer.

Results

The trained model demonstrates proficiency in coloring certain high-level image elements such as vegetation and water features, producing near-photo-realistic outcomes in some instances. However, the network exhibits limitations with specific object details due to the constrained training dataset size. Comparisons with existing methods underscore a tendency towards conservative predictions, noticeable in less saturated outputs. Despite that some results were encouraging, achieving true realism remains a challenge.

Implications and User Study

A user paper conducted to evaluate the perceived realism of generated images indicates a promising capability of the model to "fool" observers: approximately 45.87% of users misclassified recolored images as real. This suggests significant potential for the approach, albeit with selected images from the better-performing spectrum.

Furthermore, the application of the model to historical photographs, albeit subjective, signals a burgeoning opportunity for the restoration of culturally significant materials. This could significantly impact archival studies, offering a visually immersive glimpse into the past.

Future Directions

Acknowledging the current limitations, the paper suggests expanding the training dataset size to encompass a broader image diversity, which could alleviate specificity issues. A probabilistic model akin to a variational autoencoder could enrich the mapping between luminance and color components. Moreover, extending colorization techniques to video sequences could revolutionize the restoration of historical documentaries, although this invokes additional complexity in ensuring temporal coherence.

Conclusion

While the paper advances the conversation in automated image colorization, it remains part of an ongoing exploration where manual intervention might still play a role. The confluence of deep learning architectures like CNNs and pre-trained models such as Inception-ResNet-v2 has forged a valuable path toward enhancing computational imaging tasks. The research opens avenues for future development, bridging classic restoration methods with cutting-edge technologies capable of delivering semi-automated solutions across diverse domains.

Related Papers

YouTube

Show All Videos