Generative Portrait Shadow Removal (2410.05525v1)

Published 7 Oct 2024 in cs.CV

Abstract: We introduce a high-fidelity portrait shadow removal model that can effectively enhance the image of a portrait by predicting its appearance under disturbing shadows and highlights. Portrait shadow removal is a highly ill-posed problem where multiple plausible solutions can be found based on a single image. While existing works have solved this problem by predicting the appearance residuals that can propagate local shadow distribution, such methods are often incomplete and lead to unnatural predictions, especially for portraits with hard shadows. We overcome the limitations of existing local propagation methods by formulating the removal problem as a generation task where a diffusion model learns to globally rebuild the human appearance from scratch as a condition of an input portrait image. For robust and natural shadow removal, we propose to train the diffusion model with a compositional repurposing framework: a pre-trained text-guided image generation model is first fine-tuned to harmonize the lighting and color of the foreground with a background scene by using a background harmonization dataset; and then the model is further fine-tuned to generate a shadow-free portrait image via a shadow-paired dataset. To overcome the limitation of losing fine details in the latent diffusion model, we propose a guided-upsampling network to restore the original high-frequency details (wrinkles and dots) from the input image. To enable our compositional training framework, we construct a high-fidelity and large-scale dataset using a lightstage capturing system and synthetic graphics simulation. Our generative framework effectively removes shadows caused by both self and external occlusions while maintaining original lighting distribution and high-frequency details. Our method also demonstrates robustness to diverse subjects captured in real environments.

Summary

The paper introduces a novel generative framework for shadow removal using diffusion models that effectively separates shadows from skin tones.
It leverages a compositional repurposing strategy with background harmonization and a guided-upsampling network to maintain high-frequency details.
Performance metrics like LPIPS, SSIM, and Adaface scores demonstrate superior facial identity preservation and lighting consistency compared to conventional methods.

Generative Portrait Shadow Removal

The paper "Generative Portrait Shadow Removal," presented by Yoon et al., introduces an advanced method for removing shadows in portrait images using diffusion models. The method distinguishes itself by addressing the inherently ill-posed problem of separating complex lighting from the original skin color, particularly when strong shadows are present.

Methodology Overview

The authors propose a novel approach by framing the shadow removal task as a generative problem. This is tackled using a diffusion model trained to reconstruct a shadow-free image by learning from noise conditioned with the input portrait data. The proposed method advances beyond local appearance propagation, which traditionally suffers from handling hard shadow boundaries, by leveraging a global generation strategy.

The key innovation lies in a compositional repurposing framework, where a diffusion model is trained in stages:

Background Harmonization: Fine-tuning a pre-trained image generation model to synthesize portrait images consistent with different background lighting, thereby developing a robust lighting model.
Shadow Removal: Further refinement of the diffusion model using shadow-paired datasets to create shadow-free images that maintain fidelity with the original lighting conditions.

To address the challenge of preserving image details, a guided-upsampling network is introduced. This module ensures high-frequency details such as wrinkles and hair are retained, which are often lost during the diffusion process.

Dataset Construction

Critical to the approach is a high-fidelity dataset combining real-world and synthetic images. The authors utilize a lightstage capturing system to generate reference portraits under varied lighting conditions. Synthetic simulations are used to model external shadows cast by arbitrary occluders, complemented by augmented real-world datasets for enhanced generalization.

Performance and Implications

Quantitative assessments demonstrate that the proposed method achieves superior perceptual similarity (as measured by LPIPS) and structural similarity (SSIM) compared to traditional methods like UNet and TransformerNet. Notably, the method handles varying shadow intensities and myriad external occlusions with greater robustness. Additionally, the approach successfully maintains identity details, as evidenced by high Adaface identity preservation scores.

The practical implications of this research are significant. Enhanced shadow removal opens avenues for more accurate image editing and processing applications, including improved portrait relighting and human appearance modeling. The method’s ability to maintain the original lighting context while removing shadows is particularly beneficial for applications demanding high visual coherence.

Future Directions

While the paper presents a robust framework, there are considerations for future work. The method's applicability might be extended to more dynamic scenarios beyond static portraits. Additionally, addressing limitations in skin color preservation under extreme lighting discrepancies can push this research further.

In conclusion, Yoon et al.’s work offers significant contributions to the field of computational photography and image processing, presenting a robust framework for generative shadow removal in portraits. This method not only advances current state-of-the-art techniques but broadens the scope of generative modeling applications in AI.

PDF Markdown

Related Papers

Tweets

https://twitter.com/ssh4net/status/1844658103549296683