ColorFlow: Retrieval-Augmented Image Sequence Colorization (2412.11815v2)

Published 16 Dec 2024 in cs.CV

Abstract: Automatic black-and-white image sequence colorization while preserving character and object identity (ID) is a complex task with significant market demand, such as in cartoon or comic series colorization. Despite advancements in visual colorization using large-scale generative models like diffusion models, challenges with controllability and identity consistency persist, making current solutions unsuitable for industrial application.To address this, we propose ColorFlow, a three-stage diffusion-based framework tailored for image sequence colorization in industrial applications. Unlike existing methods that require per-ID finetuning or explicit ID embedding extraction, we propose a novel robust and generalizable Retrieval Augmented Colorization pipeline for colorizing images with relevant color references. Our pipeline also features a dual-branch design: one branch for color identity extraction and the other for colorization, leveraging the strengths of diffusion models. We utilize the self-attention mechanism in diffusion models for strong in-context learning and color identity matching. To evaluate our model, we introduce ColorFlow-Bench, a comprehensive benchmark for reference-based colorization. Results show that ColorFlow outperforms existing models across multiple metrics, setting a new standard in sequential image colorization and potentially benefiting the art industry. We release our codes and models on our project page: https://zhuang2002.github.io/ColorFlow/.

Summary

The paper introduces a three-stage pipeline (RAP, ICP, GSRP) that integrates retrieval, dual-branch diffusion, and guided super-resolution for effective image colorization.
It employs a CLIP-based patch retrieval and LoRA-enhanced diffusion model to preserve detailed color identity and ensure seamless frame consistency.
Evaluations show up to a 37% FID score reduction compared to existing models, highlighting significant improvements in aesthetic quality and identity preservation.

ColorFlow: Retrieval-Augmented Image Sequence Colorization

The paper "ColorFlow: Retrieval-Augmented Image Sequence Colorization" presents an innovative approach to image sequence colorization, a domain relevant for applications like manga, animated series, and black-and-white film colorization. The research aims to address the limitations of current image colorization techniques, particularly concerning identity preservation and color consistency across frames, which are crucial for practical applicability in industrial contexts.

Methodology

The proposed method, ColorFlow, leverages diffusion models to enhance image sequence colorization by integrating a novel pipeline composed of three main stages:

Retrieval-Augmented Pipeline (RAP): This stage aims to extract relevant colored image patches using a pool of reference images. The method divides images into patches, employing a CLIP image encoder to identify the most similar patches between input and reference images through cosine similarity calculations. The selected patches are stitched to form a composite image, which supports accurate contextual information retrieval necessary for subsequent colorization.
In-context Colorization Pipeline (ICP): The core colorization process is managed by a dual-branch design, which includes a Colorization Guider branch. This pipeline makes extensive use of the strengths of diffusion models to maintain color identity coherence. By incorporating layers from both the guider and main diffusion model, a dense pixel-wise understanding of color identity is achieved. Notably, LoRA, a low-rank adaptation technique, is employed to fine-tune the diffusion model while preserving its foundational colorization capabilities.
Guided Super-Resolution Pipeline (GSRP): This stage is developed to upscale the colorized images to their original resolution and improve detail restoration by integrating existing structure features of high-resolution grayscale images with color outputs. This process helps to counteract any structural distortions introduced during colorization.

Results

The authors introduce ColorFlow-Bench, a comprehensive benchmark to evaluate their method against existing models using metrics such as FID, PSNR, SSIM, and aesthetic scores. The paper demonstrates through extensive quantitative and qualitative evaluations that ColorFlow outperforms established techniques like MC-v2, AnimeColorDeOldify, and ScreenVAE in both preserving character identity across sequences and achieving superior aesthetic quality. Notably, ColorFlow reports a significant reduction in FID scores, up to 37%, indicating a marked improvement in the perceptual quality of colorized images.

Implications and Future Work

The implications of ColorFlow are substantial for industries involved in digital art and media creation, offering a powerful tool for automating the time-intensive process of manual colorization, with potential applications beyond manga to other art forms, line art, and even real-world scenarios. Moreover, the three-stage pipeline introduces mechanisms for retrieval-augmented learning that can be adapted or extended to other domains within AI, such as style transfer and image synthesis.

Future developments could explore more advanced base diffusion models to further enhance color accuracy and realism. Also, extending the ColorFlow framework to address more complex sequences, entailing various light conditions or styles, could broaden its applicability.

Conclusion

ColorFlow presents a structured, diffusion-based framework for colorizing image sequences with a retrieval-augmented approach, setting a new performance benchmark and offering a robust solution for practical application in digital art industries. By overcoming key challenges in identity preservation and color accuracy, the research offers a significant stride forward in automatic colorization technologies. The advancement of such capabilities will continue to integrate AI more deeply into creative workflows, optimizing efficiency while maintaining quality and artistic intent.

PDF Markdown

Related Papers

GitHub

Tweets

https://twitter.com/dreamingtulpa/status/1870155985601638625

https://twitter.com/aigclink/status/1871009408479838500

YouTube

Show All Videos