DiffBody: Diffusion-based Pose and Shape Editing of Human Images (2401.02804v2)
Abstract: Pose and body shape editing in a human image has received increasing attention. However, current methods often struggle with dataset biases and deteriorate realism and the person's identity when users make large edits. We propose a one-shot approach that enables large edits with identity preservation. To enable large edits, we fit a 3D body model, project the input image onto the 3D model, and change the body's pose and shape. Because this initial textured body model has artifacts due to occlusion and the inaccurate body shape, the rendered image undergoes a diffusion-based refinement, in which strong noise destroys body structure and identity whereas insufficient noise does not help. We thus propose an iterative refinement with weak noise, applied first for the whole body and then for the face. We further enhance the realism by fine-tuning text embeddings via self-supervised learning. Our quantitative and qualitative evaluations demonstrate that our method outperforms other existing methods across various datasets.
- Pose with Style: Detail-preserving pose-guided image synthesis with conditional StyleGAN. ACM Transactions on Graphics, 40(6), 2021.
- SCAPE: shape completion and animation of people. ACM Transactions on Graphics, 24(3):408–416, jul 2005.
- Blended diffusion for text-driven editing of natural images. In CVPR, pages 18208–18218, 2022.
- Person image synthesis via denoising diffusion model. CVPR, pages 5968–5976, 2023.
- OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. TPAMI, 43(1):172–186, 2021.
- Everybody dance now. In ICCV, pages 5932–5941, 2019.
- RetinaFace: Single-stage dense face localisation in the wild. In CVPR, pages 5202–5211, 2020.
- A variational u-net for conditional appearance and shape generation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8857–8866, 2018.
- GANs trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
- One-shot implicit animatable avatars with model-based priors. In ICCV, 2023.
- VGFlow: Visibility guided flow network for human reposing. In CVPR, pages 21088–21097, 2023.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- MetaPix: Few-Shot Video Retargeting. In ICLR, 2020.
- PoNA: Pose-guided non-local attention for human pose transfer. IEEE Transactions on Image Processing, 29:9584–9599, 2020.
- Dense intrinsic appearance flow for human pose transfer. In CVPR, 2019.
- Liquid warping GAN with attention: A unified framework for human image synthesis. TPAMI, 44(9):5114–5132, 2022.
- DeepFashion: Powering robust clothes recognition and retrieval with rich annotations. In CVPR 2016, pages 1096–1104. IEEE Computer Society, 2016.
- SGDR: Stochastic gradient descent with warm restarts. In ICLR, 2017.
- Learning semantic person image generation by region-adaptive normalization. In CVPR, pages 10801–10810, 2021.
- FDA-GAN: Flow-based dual attention GAN for human pose transfer. IEEE Transactions on Multimedia, 25:930–941, 2021.
- Pose guided person image generation. Advances in neural information processing systems, 30, 2017.
- SDEdit: Guided image synthesis and editing with stochastic differential equations. In ICLR, 2022.
- MagFace: A universal representation for face recognition and quality assessment. In CVPR, 2021.
- T2I-Adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models. CoRR, abs/2302.08453, 2023.
- Dense pose transfer. In ECCV, pages 123–138, 2018.
- Expressive body capture: 3D hands, face, and body from a single image. In CVPR, pages 10975–10985, 2019.
- Poisson image editing. ACM Trans. Graph., 22(3):313–318, 2003.
- 3DPeople: Modeling the Geometry of Dressed Humans. In ICCV, pages 2242–2251, 2019.
- Learning transferable visual models from natural language supervision. In ICML, volume 139, pages 8748–8763, 2021.
- Structure-aware flow generation for human body reshaping. In CVPR, pages 7744–7753, 2022.
- Neural texture extraction and distribution for controllable person image synthesis. CoRR, abs/2204.06160, 2022.
- Deep image spatial transformation for person image generation. In CVPR, pages 7690–7699, June 2020.
- High-resolution image synthesis with latent diffusion models. In CVPR, pages 10674–10685, 2022.
- DreamBooth: Fine tuning text-to-image diffusion models for subject-driven generation. In CVPR, pages 22500–22510, 2023.
- DINAR: Diffusion inpainting of neural textures for one-shot human avatars. In ICCV, pages 7062–7072, 2023.
- Xinggan for person image generation. In ECCV, 2020.
- Megvii Technology. Face++. https://www.faceplusplus.com/. accessed 1 November 2023.
- Texel. Texel 3D body model dataset. https://texel.graphics/texel-3d-body-model-dataset/. accessed 28 August 2023.
- Attention is all you need. In NeurIPS 2017, pages 5998–6008, 2017.
- Adaptive wing loss for robust face alignment via heatmap regression. In ICCV, pages 6971–6981, 2019.
- MonoPerfCap: Human performance capture from monocular video. ACM Trans. Graph., 37(2):27:1–27:15, May 2018.
- Pose-guided human animation from a single image in the wild. In CVPR, pages 15039–15048, 2021.
- PyMAF: 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In ICCV, pages 11446–11456, 2021.
- PISE: Person image synthesis and editing with decoupled GAN. In CVPR, pages 7982–7990, 2021.
- Adding conditional control to text-to-image diffusion models. In ICCV, pages 3836–3847, 2023.
- Exploring dual-task correlation for pose guided person image generation. In CVPR, pages 7713–7722, 2022.
- The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
- Parametric reshaping of human bodies in images. ACM transactions on graphics, 29(4):1–10, 2010.
- Cross attention based style distribution for controllable person image synthesis. In ECCV, pages 161–178, 2022.
- Progressive pose attention transfer for person image generation. In CVPR, pages 2347–2356, 2019.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.