LumiSculpt: A Consistency Lighting Control Network for Video Generation (2410.22979v1)
Abstract: Lighting plays a pivotal role in ensuring the naturalness of video generation, significantly influencing the aesthetic quality of the generated content. However, due to the deep coupling between lighting and the temporal features of videos, it remains challenging to disentangle and model independent and coherent lighting attributes, limiting the ability to control lighting in video generation. In this paper, inspired by the established controllable T2I models, we propose LumiSculpt, which, for the first time, enables precise and consistent lighting control in T2V generation models.LumiSculpt equips the video generation with strong interactive capabilities, allowing the input of custom lighting reference image sequences. Furthermore, the core learnable plug-and-play module of LumiSculpt facilitates remarkable control over lighting intensity, position, and trajectory in latent video diffusion models based on the advanced DiT backbone.Additionally, to effectively train LumiSculpt and address the issue of insufficient lighting data, we construct LumiHuman, a new lightweight and flexible dataset for portrait lighting of images and videos. Experimental results demonstrate that LumiSculpt achieves precise and high-quality lighting control in video generation.
- GPT-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Pix2Video: Video editing using image diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 23206–23217, 2023.
- StableVideo: Text-driven consistency-aware diffusion video editing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 23040–23050, 2023.
- VideoCrafter1: Open diffusion models for high-quality video generation. arXiv preprint arXiv:2310.19512, 2023.
- Acquiring the reflectance field of a human face. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’00, pp. 145–156, USA, 2000. ACM Press/Addison-Wesley Publishing Co.
- TaleCrafter: Interactive story visualization with multiple characters. In ACM SIGGRAPH Asia Conference Proceedings, pp. 101:1–101:10, 2023.
- AnimateDiff: Animate your personalized text-to-image diffusion models without specific tuning. In International Conference on Learning Representations (ICLR), 2024.
- CameraCtrl: Enabling camera control for text-to-video generation. arXiv preprint arXiv:2404.02101, 2024.
- Animate-A-Story: Storytelling with retrieval-augmented video generation. arXiv preprint arXiv:2307.06940, 2023.
- Imagen Video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022a.
- Video diffusion models. In International Conference on Learning Representations (ICLR), 2022b.
- CogVideo: Large-scale pretraining for text-to-video generation via transformers. arXiv preprint arXiv:2205.15868, 2022.
- Towards high fidelity face relighting with realistic shadows. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 14719–14728, 2021.
- Face relighting with geometrically consistent shadows. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4217–4226, 2022.
- Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE international conference on computer vision, pp. 1501–1510, 2017.
- Neural style transfer: A review. IEEE Transactions on Visualization and Computer Graphics, 26(11):3365–3385, nov 2020. ISSN 1941-0506.
- Switchlight: Co-design of physics-driven architecture and pre-training framework for human portrait relighting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 25096–25106, 2024.
- Open-sora-plan, April 2024. URL https://doi.org/10.5281/zenodo.10948109.
- BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In International conference on machine learning, pp. 12888–12900. PMLR, 2022.
- LightPainter: interactive portrait relighting with freehand scribble. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 195–205, 2023.
- Holo-Relighting: Controllable volumetric portrait relighting from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4263–4273, 2024.
- MetaHuman, 2023. URL https://www.unrealengine.com/en-US/metahuman.
- Learning physics-guided face relighting under directional light. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5124–5133, 2020.
- Total Relighting: learning to relight portraits for background replacement. ACM Transactions on Graphics, 40(4):43:–43:21, 2021.
- Learning transferable visual models from natural language supervision. In International Conference on Machine Learning (ICML), pp. 8748–8763, 2021.
- Relightful harmonization: Lighting-aware portrait background replacement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6452–6462, 2024.
- Single image portrait relighting. ACM Transactions on Graphics, 38(4):79:1–79:12, 2019.
- ModelScope text-to-video technical report. arXiv preprint arXiv:2308.06571, 2023a.
- VideoComposer: Compositional video synthesis with motion controllability. In Advances in Neural Information Processing Systems (NeurIPS), 2023b.
- SunStage: Portrait reconstruction and relighting using the sun as a light stage. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20792–20802, 2023c.
- Single image portrait relighting via explicit multiple reflectance channel modeling. ACM Transactions on Graphics, 39(6):220:1–220:13, 2020.
- DreamVideo: Composing your dream videos with customized subject and motion. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6537–6549, 2024.
- Tune-a-Video: One-shot tuning of image diffusion models for text-to-video generation. In IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7623–7633, 2023.
- Retinex-diffusion: On controlling illumination conditions in diffusion models via retinex theory. arXiv preprint arXiv:2407.20785, 2024.
- Learning to relight portrait images via a virtual light stage and synthetic-to-real adaptation. ACM Transactions on Graphics, 41(6):23:1–231:21, 2022.
- Dilightnet: Fine-grained lighting control for diffusion-based image generation. In ACM SIGGRAPH 2024 Conference Papers, pp. 1–12, 2024.
- Ic-light github page, 2024.
- Neural light transport for relighting and view synthesis. ACM Transactions on Graphics, 40(1):9:1–9:17, 2021.
- MotionCrafter: One-shot motion customization of diffusion models. arXiv preprint arXiv:2312.05288, 2023.
- MotionDirector: Motion customization of text-to-video diffusion models. arXiv preprint arXiv:2310.08465, 2023.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.