Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 77 tok/s

Gemini 2.5 Pro 45 tok/s Pro

GPT-5 Medium 24 tok/s Pro

GPT-5 High 21 tok/s Pro

GPT-4o 75 tok/s Pro

Kimi K2 206 tok/s Pro

GPT OSS 120B 431 tok/s Pro

Claude Sonnet 4 38 tok/s Pro

2000 character limit reached

MangaDiT: Reference-Guided Line Art Colorization with Hierarchical Attention in Diffusion Transformers (2508.09709v1)

Published 13 Aug 2025 in cs.CV

Abstract: Recent advances in diffusion models have significantly improved the performance of reference-guided line art colorization. However, existing methods still struggle with region-level color consistency, especially when the reference and target images differ in character pose or motion. Instead of relying on external matching annotations between the reference and target, we propose to discover semantic correspondences implicitly through internal attention mechanisms. In this paper, we present MangaDiT, a powerful model for reference-guided line art colorization based on Diffusion Transformers (DiT). Our model takes both line art and reference images as conditional inputs and introduces a hierarchical attention mechanism with a dynamic attention weighting strategy. This mechanism augments the vanilla attention with an additional context-aware path that leverages pooled spatial features, effectively expanding the model's receptive field and enhancing region-level color alignment. Experiments on two benchmark datasets demonstrate that our method significantly outperforms state-of-the-art approaches, achieving superior performance in both qualitative and quantitative evaluations.