TC-Light: Temporally Coherent Generative Rendering for Realistic World Transfer (2506.18904v2)

Published 23 Jun 2025 in cs.CV

Abstract: Illumination and texture editing are critical dimensions for world-to-world transfer, which is valuable for applications including sim2real and real2real visual data scaling up for embodied AI. Existing techniques generatively re-render the input video to realize the transfer, such as video relighting models and conditioned world generation models. Nevertheless, these models are predominantly limited to the domain of training data (e.g., portrait) or fall into the bottleneck of temporal consistency and computation efficiency, especially when the input video involves complex dynamics and long durations. In this paper, we propose TC-Light, a novel generative renderer to overcome these problems. Starting from the video preliminarily relighted by an inflated video relighting model, it optimizes appearance embedding in the first stage to align global illumination. Then it optimizes the proposed canonical video representation, i.e., Unique Video Tensor (UVT), to align fine-grained texture and lighting in the second stage. To comprehensively evaluate performance, we also establish a long and highly dynamic video benchmark. Extensive experiments show that our method enables physically plausible re-rendering results with superior temporal coherence and low computation cost. The code and video demos are available at https://dekuliutesla.github.io/tclight/.

Summary

An Examination of TC-Light: Efficient Relighting with Temporal Consistency for Dynamic Long Videos

The paper "TC-Light: Temporally Consistent Relighting for Dynamic Long Videos" presents a novel approach for addressing challenges inherent in the relighting of long-duration videos with complex dynamics. The authors introduce a two-stage optimization method in a framework called TC-Light, which significantly enhances the temporal consistency and computational efficiency of video relighting tasks. This research extends beyond the predominant focus of existing techniques on portrait videos, addressing the broader and more intricate challenge of relighting videos with dynamic scenes and frequent foreground object transitions.

The paper identifies the limitations of current video relighting techniques, noting that many methods are either limited to static images or lack efficiency and consistency when handling videos. The core contribution of TC-Light is its post-optimization mechanism that targets these limitations by introducing a decoupled optimization strategy. Initially, it applies a stage designed to align global illumination through appearance embedding adjustment. This is followed by a second stage leveraging the Unique Video Tensor (UVT), a canonical video representation, to ensure fine-grained consistency in texture and lighting.

Numerical results presented within the paper demonstrate the effectiveness of TC-Light in achieving superior temporal coherence. According to experimental evaluations, TC-Light achieves high-quality relighting results with reduced computational overhead compared to other state-of-the-art methods. The methodology is notably efficient in its post-processing capabilities, minimizing both the VRAM load and runtime, which allows it to handle lengthy video sequences that would typically pose challenges to existing models.

The paper also contributes a new benchmark for relighting tasks, characterized by containing long and highly dynamic videos across a diverse range of scenarios. This benchmark serves the dual purpose of a rigorous testbed for the method and a resource for the broader research community.

Key claims made by the authors include the improved temporal coherence and reduced computational cost of TC-Light, supported by extensive quantitative evaluations and a user preference paper. The paper describes the technical foundation of its components explicitly: the integration of IC-Light, a state-of-the-art image relighting model, as the basis for TC-Light and the enhancements applied to adapt IC-Light into the video domain.

Looking forward, the implications of this work could expand into the field of embodied AI, where large-scale, high-quality data generation is critical. By effectively bridging the simulation-to-real gap through reliable relighting, such methods could become instrumental in training models that operate in visually complex real-world environments. Furthermore, the efficiency of TC-Light presents opportunities for real-time applications, such as live video manipulation and augmented reality experiences.

Overall, TC-Light stands as a substantial advancement in the field of video relighting, providing a scalable and effective solution to challenges imposed by lengthy and complex input scenarios. Its methodological innovations and extensive benchmarking set a strong precedent for future research that aims to push the boundaries of efficient and consistent video editing techniques.

Related Papers

GitHub

YouTube

Show All Videos

HackerNews

Relight Your Dynamic Long Videos for Embodied Agents and Film Making (2 points, 1 comment)