Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Face Swap via Diffusion Model (2403.01108v2)

Published 2 Mar 2024 in cs.CV

Abstract: This technical report presents a diffusion model based framework for face swapping between two portrait images. The basic framework consists of three components, i.e., IP-Adapter, ControlNet, and Stable Diffusion's inpainting pipeline, for face feature encoding, multi-conditional generation, and face inpainting respectively. Besides, I introduce facial guidance optimization and CodeFormer based blending to further improve the generation quality. Specifically, we engage a recent light-weighted customization method (i.e., DreamBooth-LoRA), to guarantee the identity consistency by 1) using a rare identifier "sks" to represent the source identity, and 2) injecting the image features of source portrait into each cross-attention layer like the text features. Then I resort to the strong inpainting ability of Stable Diffusion, and utilize canny image and face detection annotation of the target portrait as the conditions, to guide ContorlNet's generation and align source portrait with the target portrait. To further correct face alignment, we add the facial guidance loss to optimize the text embedding during the sample generation. The code is available at: https://github.com/somuchtome/Faceswap

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)
  1. AUTOMATIC1111. stable-diffusion-webui.
  2. CrucibleAI. Controlnetmediapipeface.
  3. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4690–4699, 2019.
  4. An image is worth one word: Personalizing text-to-image generation using textual inversion, 2022.
  5. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  6. Smooth-swap: a simple enhancement for face-swapping with smoothness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10779–10788, 2022a.
  7. Diffface: Diffusion-based face swapping with facial guidance. Arxiv, 2022b.
  8. Multi-concept customization of text-to-image diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1931–1941, 2023.
  9. Face x-ray for more general face forgery detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5001–5010, 2020.
  10. Learning a model of facial shape and expression from 4d scans. ACM Transactions on Graphics, page 1–17, 2017.
  11. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  12. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22500–22510, 2023.
  13. Learning to regress 3d face shape and expression from an image without 3d supervision. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  14. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  15. Cosface: Large margin cosine loss for deep face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5265–5274, 2018.
  16. Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. 2023.
  17. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.
  18. Towards robust blind face restoration with codebook lookup transformer. In NeurIPS, 2022.

Summary

We haven't generated a summary for this paper yet.