Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 131 tok/s
Gemini 2.5 Pro 46 tok/s Pro
GPT-5 Medium 26 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 71 tok/s Pro
Kimi K2 192 tok/s Pro
GPT OSS 120B 385 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

L-C4: Language-Based Video Colorization for Creative and Consistent Color (2410.04972v2)

Published 7 Oct 2024 in cs.CV

Abstract: Automatic video colorization is inherently an ill-posed problem because each monochrome frame has multiple optional color candidates. Previous exemplar-based video colorization methods restrict the user's imagination due to the elaborate retrieval process. Alternatively, conditional image colorization methods combined with post-processing algorithms still struggle to maintain temporal consistency. To address these issues, we present Language-based video Colorization for Creative and Consistent Colors (L-C4) to guide the colorization process using user-provided language descriptions. Our model is built upon a pre-trained cross-modality generative model, leveraging its comprehensive language understanding and robust color representation abilities. We introduce the cross-modality pre-fusion module to generate instance-aware text embeddings, enabling the application of creative colors. Additionally, we propose temporally deformable attention to prevent flickering or color shifts, and cross-clip fusion to maintain long-term color consistency. Extensive experimental results demonstrate that L-C4 outperforms relevant methods, achieving semantically accurate colors, unrestricted creative correspondence, and temporally robust consistency.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. GPT-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  2. Is space-time attention all you need for video understanding? In ICML, 2021.
  3. Align your latents: High-resolution video synthesis with latent diffusion models. In CVPR, 2023.
  4. Quo vadis, action recognition? a new model and the Kinetics dataset. In CVPR, 2017.
  5. L-CoDer: Language-based colorization with color-object decoupling transformer. In ECCV, 2022.
  6. L-CAD: Language-based colorization with any-level descriptions using diffusion priors. In NeurIPS, 2023a.
  7. L-CoIns: Language-based colorization with instance awareness. In CVPR, 2023b.
  8. Language-based image editing with recurrent attentive models. In CVPR, 2018.
  9. InstructBLIP: Towards general-purpose vision-language models with instruction tuning. In NIPS, 2023.
  10. Structure and content-guided video synthesis with diffusion models. In ICCV, 2023.
  11. Measuring colorfulness in natural images. In Human vision and electronic imaging VIII, 2003.
  12. Latent video diffusion models for high-fidelity long video generation. arXiv preprint arXiv:2211.13221, 2022.
  13. Imagen video: High definition video generation with diffusion models, 2022a.
  14. Video diffusion models. In NeurIPS, 2022b.
  15. UniColor: A unified framework for multi-modal colorization with transformer. In SIGGRAPH Asia, 2022.
  16. Scope of validity of PSNR in image/video quality assessment. Electronics letters, 2008.
  17. Deepremaster: temporal source-reference attention networks for comprehensive video enhancement. ACM TOG, 2019.
  18. Learning blind video temporal consistency. In ECCV, 2018.
  19. Fully automatic video colorization with self-regularization and diversity. In CVPR, 2019.
  20. Blind video temporal consistency via deep video prior. In NeurIPS, 2020.
  21. Blind video deflickering by neural filtering with a flawed atlas. In CVPR, 2023.
  22. Control color: Multimodal diffusion-based interactive image colorization. arXiv preprint arXiv:2402.10855, 2024.
  23. Video colorization with pre-trained text-to-image diffusion models, 2023.
  24. Temporally consistent video colorization with deep feature propagation and self-regularization learning. CVM, 2024.
  25. Learning to color from language. In NAACL, 2018.
  26. A benchmark dataset and evaluation methodology for video object segmentation. In CVPR, 2016.
  27. FiLM: Visual reasoning with a general conditioning layer. In AAAI, 2018.
  28. FreeNoise: Tuning-free longer video diffusion via noise rescheduling. In ICLR, 2024.
  29. Learning transferable visual models from natural language supervision. In ICML, 2021.
  30. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  31. Photorealistic text-to-image diffusion models with deep language understanding. In NeurIPS, 2022.
  32. Automatic temporally coherent video colorization. In CRV, 2019.
  33. FVD: A new metric for video generation. In ICLR, 2019.
  34. Bringing old films back to life. In CVPR, 2022.
  35. Gen-L-Video: Multi-text to long video generation via temporal co-denoising. In NeurIPS, 2023a.
  36. InternVid: A large-scale video-text dataset for multimodal understanding and generation. In ICLR, 2023b.
  37. Image quality assessment: From error visibility to structural similarity. TIP, 2004.
  38. L-CoDe: Language-based colorization using color-object decoupled conditions. In AAAI, 2022.
  39. Make-Your-Video: Customized video generation using textual and structural guidance. IEEE TVCG, 2024.
  40. VRIPT: A video is worth thousands of words. arXiv preprint arXiv:2406.06040, 2024a.
  41. BiSTNet: Semantic image prior guided bidirectional temporal feature fusion for deep exemplar-based video colorization. IEEE TPAMI, 2024b.
  42. Deep exemplar-based video colorization. In CVPR, 2019.
  43. Adding conditional control to text-to-image diffusion models. In ICCV, 2023.
  44. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
  45. ControlVideo: Training-free controllable text-to-video generation. In ICLR, 2024.
  46. VCGAN: Video colorization with hybrid generative adversarial network. IEEE TMM, 2023.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Questions

We haven't generated a list of open questions mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.