Relightful Harmonization: Lighting-aware Portrait Background Replacement (2312.06886v2)
Abstract: Portrait harmonization aims to composite a subject into a new background, adjusting its lighting and color to ensure harmony with the background scene. Existing harmonization techniques often only focus on adjusting the global color and brightness of the foreground and ignore crucial illumination cues from the background such as apparent lighting direction, leading to unrealistic compositions. We introduce Relightful Harmonization, a lighting-aware diffusion model designed to seamlessly harmonize sophisticated lighting effect for the foreground portrait using any background image. Our approach unfolds in three stages. First, we introduce a lighting representation module that allows our diffusion model to encode lighting information from target image background. Second, we introduce an alignment network that aligns lighting features learned from image background with lighting features learned from panorama environment maps, which is a complete representation for scene illumination. Last, to further boost the photorealism of the proposed method, we introduce a novel data simulation pipeline that generates synthetic training pairs from a diverse range of natural images, which are used to refine the model. Our method outperforms existing benchmarks in visual fidelity and lighting coherence, showing superior generalization in real-world testing scenarios, highlighting its versatility and practicality.
- Cross-image attention for zero-shot appearance transfer. arXiv preprint arXiv:2311.03335, 2023.
- Sega: Instructing diffusion using semantic dimensions. arXiv preprint arXiv:2301.12247, 2023.
- Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18392–18402, 2023.
- Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing. arXiv preprint arXiv:2304.08465, 2023.
- Dense pixel-to-pixel harmonization via continuous image representation. IEEE Transactions on Circuits and Systems for Video Technology, pages 1–1, 2023a.
- Zero-shot image harmonization with generative model prior, 2023b.
- Dovenet: Deep image harmonization via domain verification. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8394–8403, 2020.
- High-resolution image harmonization via collaborative dual transformations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18470–18479, 2022.
- Diffedit: Diffusion-based semantic image editing with mask guidance. arXiv preprint arXiv:2210.11427, 2022.
- Acquiring the reflectance field of a human face. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques, pages 145–156, 2000.
- Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
- Learning an animatable detailed 3d face model from in-the-wild images. ACM Transactions on Graphics (ToG), 40(4):1–13, 2021.
- Guiding instruction-based image editing via multimodal large language models. arXiv preprint arXiv:2309.17102, 2023.
- Pct-net: Full resolution image harmonization using pixel-wise color transformations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5917–5926, 2023.
- Image harmonization with transformer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 14870–14879, 2021a.
- Intrinsic image harmonization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16367–16376, 2021b.
- Prompt-to-prompt image editing with cross attention control. arXiv preprint arXiv:2208.01626, 2022.
- Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
- Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
- Video diffusion models. arXiv:2204.03458, 2022.
- An edit friendly ddpm noise space: Inversion and manipulations. arXiv preprint arXiv:2304.06140, 2023.
- Ssh: A self-supervised framework for image harmonization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4832–4841, 2021.
- Denoising diffusion restoration models. In Advances in Neural Information Processing Systems, 2022a.
- Enhancing diffusion-based image synthesis with robust classifier guidance. arXiv preprint arXiv:2208.08664, 2022b.
- Jpeg artifact correction using denoising diffusion restoration models. arXiv preprint arXiv:2209.11888, 2022c.
- Harmonizer: Learning to perform white-box image and video harmonization. In European Conference on Computer Vision (ECCV), 2022.
- Diffusion-based image translation using disentangled style and content representation. arXiv preprint arXiv:2209.15264, 2022.
- Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing, 479:47–59, 2022.
- Spatial-separated curve rendering network for efficient and high-resolution image harmonization. arXiv preprint arXiv:2109.05750, 2021.
- Painterly image harmonization using diffusion model. In Proceedings of the 31st ACM International Conference on Multimedia. ACM, 2023a.
- Tf-icon: Diffusion-based training-free cross-domain image composition. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2294–2305, 2023b.
- Lightpainter: Interactive portrait relighting with freehand scribble. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 195–205, 2023.
- Sdedit: Image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021.
- Negative-prompt inversion: Fast image inversion for editing with text-guided diffusion models. arXiv preprint arXiv:2305.16807, 2023.
- Null-text inversion for editing real images using guided diffusion models. arXiv preprint arXiv:2211.09794, 2022.
- Learning physics-guided face relighting under directional light. In CVPR, pages 5124–5133, 2020.
- Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
- The blessing of randomness: Sde beats ode in general diffusion-based image editing. arXiv preprint arXiv:2311.01410, 2023.
- Deep image harmonization with learnable augmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7482–7491, 2023.
- Total relighting: learning to relight portraits for background replacement. ACM Transactions on Graphics (TOG), 40(4):1–21, 2021.
- Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- Difareli: Diffusion face relighting. 2023.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- Multiscale structure guided diffusion for image deblurring. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10721–10733, 2023.
- High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022.
- Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–10, 2022a.
- Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022b.
- Unit-ddpm: Unpaired image translation with denoising diffusion probabilistic models. arXiv preprint arXiv:2104.05358, 2021.
- Deep unsupervised learning using nonequilibrium thermodynamics. In International Conference on Machine Learning, pages 2256–2265. PMLR, 2015.
- Denoising diffusion implicit models. In International Conference on Learning Representations, 2021.
- Single image portrait relighting. ACM Transactions on Graphics (TOG), 38(4):1–12, 2019.
- Deep image harmonization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3789–3797, 2017.
- Plug-and-play diffusion features for text-driven image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1921–1930, 2023.
- Interactive portrait harmonization. arXiv preprint arXiv:2203.08216, 2022.
- Edict: Exact diffusion inversion via coupled transformations. arXiv preprint arXiv:2211.12446, 2022.
- Semi-supervised parametric real-world image harmonization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5927–5936, 2023a.
- Pretraining is all you need for image-to-image translation. arXiv preprint arXiv:2205.12952, 2022.
- Harmonized portrait-background image composition. In Computer Graphics Forum, page e14921. Wiley Online Library, 2023b.
- Single image portrait relighting via explicit multiple reflectance channel modeling. ACM TOG, 39(6):1–13, 2020.
- Chen Henry Wu and Fernando De la Torre. A latent space of stochastic diffusion models for zero-shot image editing and guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7378–7387, 2023.
- Smartbrush: Text and shape guided object inpainting with diffusion model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22428–22437, 2023.
- Understanding and improving the realism of image composites. ACM Transactions on graphics (TOG), 31(4):1–10, 2012.
- Learning to relight portrait images via a virtual light stage and synthetic-to-real adaptation. ACM TOG, 2022.
- Mask guided matting via progressive refinement network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1154–1163, 2021.
- Deep image compositing. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 365–374, 2021a.
- Neural video portrait relighting in real-time via consistency modeling. In ICCV, pages 802–812, 2021b.
- Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3836–3847, 2023a.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
- Portrait shadow manipulation. ACM Transactions on Graphics (TOG), 39(4):78–1, 2020.
- Neural light transport for relighting and view synthesis. ACM TOG, 40(1):1–17, 2021c.
- Sine: Single image editing with text-to-image diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6027–6037, 2023b.
- Egsde: Unpaired image-to-image translation via energy-guided stochastic differential equations. Advances in Neural Information Processing Systems, 35:3609–3623, 2022.
- Deep single-image portrait relighting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7194–7202, 2019.
- Learning a discriminative model for the perception of realism in composite images. In Proceedings of the IEEE International Conference on Computer Vision, pages 3943–3951, 2015.