Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Total Selfie: Generating Full-Body Selfies (2308.14740v2)

Published 28 Aug 2023 in cs.CV, cs.GR, and cs.LG

Abstract: We present a method to generate full-body selfies from photographs originally taken at arms length. Because self-captured photos are typically taken close up, they have limited field of view and exaggerated perspective that distorts facial shapes. We instead seek to generate the photo some one else would take of you from a few feet away. Our approach takes as input four selfies of your face and body, a background image, and generates a full-body selfie in a desired target pose. We introduce a novel diffusion-based approach to combine all of this information into high-quality, well-composed photos of you with the desired pose and background.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (69)
  1. Flame-in-nerf: Neural control of radiance fields for free view face animation. In 2023 IEEE 17th International Conference on Automatic Face and Gesture Recognition (FG), pages 1–8. IEEE, 2023.
  2. Zoedepth: Zero-shot transfer by combining relative and metric depth. arXiv preprint arXiv:2302.12288, 2023.
  3. Person image synthesis via denoising diffusion model. arXiv preprint arXiv:2211.12500, 2022.
  4. Nvss: High-quality novel view selfie synthesis. In 2021 International Conference on 3D Vision (3DV), pages 1085–1094. IEEE, 2021.
  5. Midas v3.1 – a model zoo for robust monocular relative depth estimation. arXiv preprint arXiv:2307.14460, 2023.
  6. Fun selfie filters in face recognition: Impact assessment and removal. IEEE Transactions on Biometrics, Behavior, and Identity Science, 5(1):91–104, 2022.
  7. Deepflash: Turning a flash selfie into a studio portrait. Signal Processing: Image Communication, 77:28–39, 2019.
  8. Efficient geometry-aware 3D generative adversarial networks. In arXiv, 2021.
  9. Size does matter: Size-aware virtual try-on via clothing-oriented transformation try-on network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7513–7522, 2023a.
  10. Open-world pose transfer via sequential test-time adaption. arXiv preprint arXiv:2303.10945, 2023b.
  11. Viton-hd: High-resolution virtual try-on via misalignment-aware normalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 14131–14140, 2021.
  12. Fashion matrix: Editing photos by just talking. arXiv preprint arXiv:2307.13240, 2023.
  13. Dressing in order: Recurrent person image generation for pose transfer, virtual try-on and outfit editing. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 14638–14647, 2021.
  14. Dance your latents: Consistent dance generation through spatial-temporal subspace attention guided by motion flow. arXiv preprint arXiv:2310.14780, 2023.
  15. Insetgan for full-body image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7723–7732, 2022.
  16. Stylegan-human: A data-centric odyssey of human generation. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVI, pages 1–19. Springer, 2022.
  17. Unitedhuman: Harnessing multi-source data for high-resolution human generation. arXiv preprint, arXiv:2309.14335, 2023.
  18. Svdiff: Compact parameter space for diffusion fine-tuning. arXiv preprint arXiv:2303.11305, 2023a.
  19. Controllable person image synthesis with pose-constrained latent diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22768–22777, 2023b.
  20. Style-based global appearance flow for virtual try-on. In CVPR, 2022.
  21. Umfuse: Unified multi view fusion for human editing applications. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7182–7191, 2023.
  22. Text2human: Text-driven controllable human image generation. ACM Transactions on Graphics (TOG), 41(4):1–11, 2022.
  23. Conerf: Controllable neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18623–18632, 2022.
  24. Dreampose: Fashion image-to-video synthesis via stable diffusion. arXiv preprint arXiv:2304.06025, 2023.
  25. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019.
  26. Reposing humans by warping 3d features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 1044–1045, 2020.
  27. Wsd: Wild selfie dataset for face recognition in selfie images. arXiv preprint arXiv:2302.07245, 2023.
  28. High-resolution virtual try-on with misalignment and occlusion-handled conditions. In European Conference on Computer Vision, pages 204–219. Springer, 2022.
  29. Collecting the puzzle pieces: Disentangled self-driven human pose transfer by permuting textures. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7126–7137, 2023a.
  30. Adaptive content feature enhancement gan for multimodal selfie to anime translation. 2021.
  31. Virtual try-on with pose-garment keypoints guided inpainting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 22788–22797, 2023b.
  32. Learning semantic person image generation by region-adaptive normalization. 2021.
  33. Unselfie: Translating selfies to neutral-pose portraits in the wild. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVII 16, pages 156–173. Springer, 2020.
  34. Waveipt: Joint attention and flow alignment in the wavelet domain for pose transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7215–7225, 2023.
  35. Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021.
  36. Ladi-vton: Latent diffusion textual-inversion enhanced virtual try-on. arXiv preprint arXiv:2305.13501, 2023.
  37. Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5865–5874, 2021a.
  38. Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. arXiv preprint arXiv:2106.13228, 2021b.
  39. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  40. Towards robust monocular depth estimation: Mixing datasets for zero-shot cross-dataset transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(3), 2022.
  41. Dalle-2 is seeing double: flaws in word-to-concept mapping in text2image models. arXiv preprint arXiv:2210.10606, 2022.
  42. Flow guided transformable bottleneck networks for motion retargeting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10795–10805, 2021.
  43. Deep spatial transformation for pose-guided person image generation and animation. IEEE Transactions on Image Processing, 2020.
  44. Neural texture extraction and distribution for controllable person image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13535–13544, 2022.
  45. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, 2022.
  46. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation. arXiv preprint arXiv:2208.12242, 2022.
  47. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022.
  48. Learning realistic human reposing using cyclic self-supervision with 3d shape, pose, and appearance consistency. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11138–11147, 2021.
  49. Sculpt: Shape-conditioned unpaired learning of pose-dependent clothed and textured human meshes. arXiv preprint arXiv:2308.10638, 2023.
  50. Distortion-free wide-angle portraits on camera phones. ACM Transactions on Graphics (TOG), 38(4):1–12, 2019.
  51. Realfill: Reference-driven generation for authentic image completion. arXiv preprint arXiv:2309.16668, 2023.
  52. Uvcgan: Unet vision transformer cycle-consistent gan for unpaired image-to-image translation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 702–712, 2023.
  53. Disco: Disentangled control for referring human dance generation in real world. arXiv preprint arXiv:2307.00040, 2023a.
  54. One-shot free-view neural talking-head synthesis for video conferencing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021.
  55. Disco: Portrait distortion correction with perspective-aware 3d gans. arXiv preprint arXiv:2302.12253, 2023b.
  56. Diffusion-hpc: Generating synthetic images with realistic humans. arXiv preprint arXiv:2303.09541, 2023.
  57. Gp-vton: Towards general purpose virtual try-on via collaborative local-flow global-parsing learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23550–23559, 2023.
  58. ECON: Explicit Clothed humans Optimized via Normal integration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
  59. Linking garment with person via semantically associated landmarks for virtual try-on. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17194–17204, 2023.
  60. Paint by example: Exemplar-based image editing with diffusion models. arXiv preprint arXiv:2211.13227, 2022.
  61. Deep learning technique for human parsing: A survey and outlook. arXiv preprint arXiv:2301.00394, 2023a.
  62. 3dhumangan: 3d-aware human image generation with 3d pose mapping. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 23008–23019, 2023b.
  63. Selfie video stabilization. In Proceedings of the European Conference on Computer Vision (ECCV), pages 551–566, 2018.
  64. Real-time selfie video stabilization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12036–12044, 2021.
  65. Adding conditional control to text-to-image diffusion models. arXiv preprint arXiv:2302.05543, 2023.
  66. Pose guided human motion transfer by exploiting 2d and 3d information. In 2022 International Conference on 3D Vision (3DV), pages 587–595. IEEE, 2022.
  67. Learning perspective undistortion of portraits. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7849–7859, 2019.
  68. Cross attention based style distribution for controllable person image synthesis. arXiv preprint arXiv:2208.00712, 2022.
  69. Tryondiffusion: A tale of two unets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4606–4615, 2023.
Citations (3)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets