AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control (2303.17606v2)

Published 30 Mar 2023 in cs.CV

Abstract: Neural implicit fields are powerful for representing 3D scenes and generating high-quality novel views, but it remains challenging to use such implicit representations for creating a 3D human avatar with a specific identity and artistic style that can be easily animated. Our proposed method, AvatarCraft, addresses this challenge by using diffusion models to guide the learning of geometry and texture for a neural avatar based on a single text prompt. We carefully design the optimization framework of neural implicit fields, including a coarse-to-fine multi-bounding box training strategy, shape regularization, and diffusion-based constraints, to produce high-quality geometry and texture. Additionally, we make the human avatar animatable by deforming the neural implicit field with an explicit warping field that maps the target human mesh to a template human mesh, both represented using parametric human models. This simplifies animation and reshaping of the generated avatar by controlling pose and shape parameters. Extensive experiments on various text descriptions show that AvatarCraft is effective and robust in creating human avatars and rendering novel views, poses, and shapes. Our project page is: https://avatar-craft.github.io/.

Authors (7)

Ruixiang Jiang (7 papers)
Can Wang (156 papers)
Jingbo Zhang (43 papers)
Menglei Chai (37 papers)
Mingming He (24 papers)
Dongdong Chen (164 papers)
Jing Liao (100 papers)

Citations (63)

View on Semantic Scholar

Summary

AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control

In recent advancements within computer graphics and artificial intelligence, neural implicit fields have shown notable efficacy in representing complex 3D scenes and enabling high-quality novel view synthesis. Despite these capabilities, challenges persist when applying such implicit representations to the creation of 3D human avatars characterized by unique identities and customizable artistic styles while supporting animation. The paper titled "AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control" introduces a novel methodology—AvatarCraft—aimed at overcoming these challenges through the utilization of diffusion models guiding geometry and texture learning, combined with a strategic optimization framework.

AvatarCraft distinguishes itself by employing diffusion models to inform the geometry and texture generation of neural avatars using text prompts. The optimization framework comprises a coarse-to-fine multi-bounding box training strategy, shape regularization, and diffusion-based constraints, collectively enhancing the output quality. Particularly noteworthy is the method's ability to render the human avatar animatable by deforming the neural implicit field into an explicit warping field. This field maps a target human mesh to a template human mesh within parametric human models, simplifying avatar animation and reshaping via poses and shape parameters.

Text-guided avatar creation introduces complex challenges primarily at the intersection of high-quality geometry and texture generation paired with flexible animation capabilities. Existing methodologies have attempted to address these by leveraging cross-modal supervision as well as modeling avatars through explicit meshes to support animation. However, limitations arise concerning their ability to consistently generate detailed and coherent avatar appearances, especially when subjected to animation or when making use of simpler mesh-based representations.

In response, AvatarCraft's integration of diffusion models—widely recognized for their robust text-to-image generation abilities—marks a decisive shift from traditional methods like CLIP in achieving greater consistency and detail in avatar appearance. Complementing this, neural implicit fields used in AvatarCraft come with advantages such as superior view synthesis capabilities and realistic occlusion preservation, allowing avatars to be seamlessly integrated into larger implicit 3D environments.

AvatarCraft's introduction of a coarse-to-fine multi-resolution strategy allows for capturing style details across varying scales, ensuring the synthesis aligns with desired artistic expressions while maintaining clarity and definition across fine and expansive features alike. The proposed shape regularization mechanism further stabilizes the generation process, allowing AvatarCraft to deliver anatomically plausible avatars without compromising stylistic detail precision.

Furthermore, the inclusion of SMPL-guided deformation facilitates intuitive control over both pose and shape parameters, enabling a range of applications from animation to shape customization without necessitating additional model training. Such innovations imply considerable practical implications for graphics and media industries, enabling user-friendly, high-quality content generation with reduced reliance on manual input from skilled artists, and fostering potential advancements in virtual reality and entertainment sectors.

Ultimately, AvatarCraft represents a significant contribution to the intersection of AI-driven content creation and graphical representation, yielding practical tools and techniques for realizing complex avatar designs and enhancing animation capabilities. Future research may explore expanding the diffusion model's robustness to handle underrepresented viewpoints more effectively, thereby ensuring uniform quality and accuracy across diverse avatar configurations and settings.

PDF Markdown

Related Papers

GitHub

AvatarCraft: Transforming Text into Neural Human Avatars with Parameterized Shape and Pose Control

Tweets

YouTube

Show All Videos