Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset (2503.14485v2)

Published 18 Mar 2025 in cs.GR and cs.CV

Abstract: Video portrait relighting remains challenging because the results need to be both photorealistic and temporally stable. This typically requires a strong model design that can capture complex facial reflections as well as intensive training on a high-quality paired video dataset, such as dynamic one-light-at-a-time (OLAT). In this work, we introduce Lux Post Facto, a novel portrait video relighting method that produces both photorealistic and temporally consistent lighting effects. From the model side, we design a new conditional video diffusion model built upon state-of-the-art pre-trained video diffusion model, alongside a new lighting injection mechanism to enable precise control. This way we leverage strong spatial and temporal generative capability to generate plausible solutions to the ill-posed relighting problem. Our technique uses a hybrid dataset consisting of static expression OLAT data and in-the-wild portrait performance videos to jointly learn relighting and temporal modeling. This avoids the need to acquire paired video data in different lighting conditions. Our extensive experiments show that our model produces state-of-the-art results both in terms of photorealism and temporal consistency.

Summary

  • The paper introduces Lux Post Facto, a novel method using a conditional video diffusion model and lighting control modules to achieve photorealistic and temporally stable video portrait relighting.
  • The model utilizes a hybrid dataset of static OLAT data and in-the-wild videos, demonstrating state-of-the-art performance against existing methods on metrics like PSNR, SSIM, and LPIPS.
  • This approach makes advanced video portrait relighting more accessible, enabling non-experts to perform high-quality lighting adjustments and advancing research in spatial-temporal diffusion models with explicit lighting controls.

Insights into "Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset"

The paper presents a novel approach to the problem of video portrait relighting, which is inherently challenging due to the need for photorealism and temporal stability in the resultant images. Titled "Lux Post Facto," the method leverages advancements in diffusion models to address the limitations of existing techniques, offering a means to relight arbitrary in-the-wild portrait videos effectively.

Methodological Framework

Lux Post Facto reforms video relighting into a conditional generation task by employing a state-of-the-art video diffusion model. The approach involves creating a delighting model to predict an albedo video free of shading, followed by a relighting model that employs High Dynamic Range (HDR) maps to synthetically illuminate portrait videos. The architecture pivots on conditioning the spatio-temporal diffusion model to maintain coherence across frames, a haLLMark of sophistication in video relighting tasks. This model captures not only the lighting effects through a newly devised lighting control module, which uses "lighting embeddings" distilled from HDR maps, but it further ensures precise reproduction of specified lighting conditions.

Hybrid Dataset Utilization

A noteworthy aspect of this paper is the employment of a hybrid dataset. This dataset comprises static expression OLAT data enriched with in-the-wild performance videos, mitigating the strain of acquiring paired data across varying lighting conditions. The strategic diversity inherent in OLAT (One-Light-At-a-Time) datasets allows the model to generalize relighting across both stationary visual data and video sequences that naturally evolve through variable ambient conditions. This hybrid approach fosters robust training that aligns the photorealistic standards of static images with the dynamic requirements of video sequences.

Experimental Evaluation

The model exhibits superiority in delivering state-of-the-art results when evaluated against existing single-image and video relighting models. Qualitative and quantitative assessments indicate Lux Post Facto excels across metrics such as PSNR, SSIM, and perceptual quality measures like LPIPS and NIQE. These results demonstrate the model's capacity to maintain high fidelity in lighting effects and temporal stability, essential for realistic video portrait editing. Additionally, comparisons with contemporary models like NVPR and SwitchLight reveal Lux Post Facto's edge in relighting realism and detail preservation.

Implications and Future Directions

Practically, the model broadens the accessibility of advanced video editing tools, enabling non-experts to execute high-caliber lighting adjustments in post-production without extensive equipment or technical prowess. Theoretically, it pushes forward the research frontier in diffusion models by integrating spatial and temporal conditioning with explicit lighting controls, thus presenting a comprehensive framework adaptable to future advancements in both dataset curation and model architecture.

Looking ahead, enhancements in real-time processing capabilities could extend applicability, particularly within interactive environments where immediate visual feedback is paramount. Additionally, further exploration into lightweight model architectures may alleviate computational demands, thereby expanding usability across varied hardware constraints. The ethical dimension of relighting capabilities, pertinent to content authenticity and potential misuse, suggests a path towards integrating detection mechanisms and responsible usage frameworks within the system design.

Lux Post Facto underscores the evolving capabilities of diffusion models in bridging static and dynamic image processing, yielding a toolset that not only enriches visual storytelling but also democratizes access to advanced photorealistic video manipulation techniques.