Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 73 tok/s
Gemini 2.5 Pro 40 tok/s Pro
GPT-5 Medium 32 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 75 tok/s Pro
Kimi K2 184 tok/s Pro
GPT OSS 120B 466 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

FashionEngine: Interactive 3D Human Generation and Editing via Multimodal Controls (2404.01655v3)

Published 2 Apr 2024 in cs.CV

Abstract: We present FashionEngine, an interactive 3D human generation and editing system that creates 3D digital humans via user-friendly multimodal controls such as natural languages, visual perceptions, and hand-drawing sketches. FashionEngine automates the 3D human production with three key components: 1) A pre-trained 3D human diffusion model that learns to model 3D humans in a semantic UV latent space from 2D image training data, which provides strong priors for diverse generation and editing tasks. 2) Multimodality-UV Space encoding the texture appearance, shape topology, and textual semantics of human clothing in a canonical UV-aligned space, which faithfully aligns the user multimodal inputs with the implicit UV latent space for controllable 3D human editing. The multimodality-UV space is shared across different user inputs, such as texts, images, and sketches, which enables various joint multimodal editing tasks. 3) Multimodality-UV Aligned Sampler learns to sample high-quality and diverse 3D humans from the diffusion prior. Extensive experiments validate FashionEngine's state-of-the-art performance for conditional generation/editing tasks. In addition, we present an interactive user interface for our FashionEngine that enables both conditional and unconditional generation tasks, and editing tasks including pose/view/shape control, text-, image-, and sketch-driven 3D human editing and 3D virtual try-on, in a unified framework. Our project page is at: https://taohuumd.github.io/projects/FashionEngine.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (57)
  1. InstructPix2Pix: Learning to Follow Image Editing Instructions. In CVPR.
  2. Dreamavatar: Text-and-shape guided 3d human avatar generation via diffusion models. arXiv preprint arXiv:2304.00916 (2023).
  3. Guide3D: Create 3D Avatars from Text and Image Guidance. arXiv preprint arXiv:2308.09705 (2023).
  4. Efficient geometry-aware 3D generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16123–16133.
  5. pi-gan: Periodic implicit generative adversarial networks for 3d-aware image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 5799–5809.
  6. PrimDiffusion: Volumetric Primitives Diffusion for 3D Human Generation. In Thirty-seventh Conference on Neural Information Processing Systems.
  7. AG3D: Learning to Generate 3D Avatars from 2D Image Collections. ArXiv abs/2305.02312 (2023). https://api.semanticscholar.org/CorpusID:258461509
  8. InsetGAN for Full-Body Image Generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7723–7732.
  9. StyleGAN-Human: A Data-Centric Odyssey of Human Generation. In European Conference on Computer Vision. https://api.semanticscholar.org/CorpusID:248377018
  10. Generative adversarial networks. Commun. ACM 63, 11 (2020), 139–144.
  11. StylePeople: A Generative Model of Fullbody Human Avatars. 2021 (CVPR) (2021), 5147–5156.
  12. Nerfdiff: Single-image view synthesis with nerf-guided distillation from 3d-aware diffusion. In International Conference on Machine Learning. PMLR, 11808–11826.
  13. 3dgen: Triplane latent diffusion for textured mesh generation. arXiv preprint arXiv:2303.05371 (2023).
  14. Ganspace: Discovering interpretable gan controls. Advances in neural information processing systems 33 (2020), 9841–9850.
  15. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. In NIPS.
  16. EVA3D: Compositional 3D Human Generation from 2D Image Collections. ArXiv abs/2210.04888 (2022). https://api.semanticscholar.org/CorpusID:252780848
  17. Avatarclip: Zero-shot text-driven generation and animation of 3d avatars. arXiv preprint arXiv:2205.08535 (2022).
  18. Ming-Kuei Hu. 1962. Visual pattern recognition by moment invariants. IRE Transactions on Information Theory 8, 2 (1962), 179–187. https://doi.org/10.1109/TIT.1962.1057692
  19. StructLDM: Structured Latent Diffusion for 3D Human Generation. arXiv:2404.01241 [cs.CV]
  20. SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering. arXiv:2404.01225 [cs.CV]
  21. EgoRenderer: Rendering Human Avatars From Egocentric Camera Images. In ICCV.
  22. HVTR++: Image and Pose Driven Human Avatars using Hybrid Volumetric-Textural Rendering. IEEE Transactions on Visualization and Computer Graphics (2023), 1–15. https://doi.org/10.1109/TVCG.2023.3297721
  23. HVTR: Hybrid Volumetric-Textural Rendering for Human Avatars. 3DV (2022).
  24. Text2Performer: Text-Driven Human Video Generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision.
  25. Text2Human: Text-Driven Controllable Human Image Generation. ACM Transactions on Graphics (TOG) 41, 4, Article 162 (2022), 11 pages. https://doi.org/10.1145/3528223.3530104
  26. Heewoo Jun and Alex Nichol. 2023. Shap-e: Generating conditional 3d implicit functions. arXiv preprint arXiv:2305.02463 (2023).
  27. Alias-Free Generative Adversarial Networks. In Proc. NeurIPS.
  28. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 4401–4410.
  29. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 8110–8119.
  30. Tryongan: Body-aware try-on via layered interpolation. ACM Transactions on Graphics (TOG) 40, 4 (2021), 1–10.
  31. Self-Correction for Human Parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020). https://doi.org/10.1109/TPAMI.2020.3048039
  32. HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion. arXiv preprint arXiv:2310.08579 (2023).
  33. Shitong Luo and Wei Hu. 2021. Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2837–2845.
  34. NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis. In ECCV.
  35. DiffRF: Rendering-Guided 3D Radiance Field Diffusion. arXiv preprint arXiv:2212.01206 (2022).
  36. Point-E: A System for Generating 3D Point Clouds from Complex Prompts. arXiv preprint arXiv:2212.08751 (2022).
  37. Michael Niemeyer and Andreas Geiger. 2021. Giraffe: Representing scenes as compositional generative neural feature fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11453–11464.
  38. Unsupervised Learning of Efficient Geometry-Aware Neural Articulated Representations. arXiv preprint arXiv:2204.08839 (2022).
  39. Stylesdf: High-resolution 3d-consistent image and geometry generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13503–13513.
  40. Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold. In ACM SIGGRAPH 2023 Conference Proceedings.
  41. Drag your gan: Interactive point-based manipulation on the generative image manifold. In ACM SIGGRAPH 2023 Conference Proceedings. 1–11.
  42. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748–8763.
  43. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
  44. Style and pose control for image synthesis of humans from a single monocular view. arXiv preprint arXiv:2102.11263 (2021).
  45. HumanGAN: A Generative Model of Humans Images. arXiv preprint arXiv:2103.06902 (2021).
  46. Yujun Shen and Bolei Zhou. 2021. Closed-form factorization of latent semantics in gans. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 1532–1540.
  47. 3D Neural Field Generation using Triplane Diffusion. arXiv preprint arXiv:2211.16677 (2022).
  48. Andrey Voynov and Artem Babenko. 2020. Unsupervised discovery of interpretable directions in the gan latent space. In International conference on machine learning. PMLR, 9786–9796.
  49. Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion. arXiv preprint arXiv:2212.06135 (2022).
  50. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing 13 (2004), 600–612.
  51. Dwnet: Dense warp-based network for pose-guided human video generation. arXiv preprint arXiv:1910.09139 (2019).
  52. 3D Human Mesh Regression With Dense Correspondence. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020), 7052–7061. https://api.semanticscholar.org/CorpusID:219558352
  53. LION: Latent Point Diffusion Models for 3D Shape Generation. arXiv preprint arXiv:2210.06978 (2022).
  54. Adding Conditional Control to Text-to-Image Diffusion Models.
  55. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. CVPR (2018), 586–595.
  56. 3d shape generation and completion through point-voxel diffusion. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5826–5835.
  57. Generative visual manipulation on the natural image manifold. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part V 14. Springer, 597–613.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Reddit Logo Streamline Icon: https://streamlinehq.com