TexVocab: Texture Vocabulary-conditioned Human Avatars
Abstract: To adequately utilize the available image evidence in multi-view video-based avatar modeling, we propose TexVocab, a novel avatar representation that constructs a texture vocabulary and associates body poses with texture maps for animation. Given multi-view RGB videos, our method initially back-projects all the available images in the training videos to the posed SMPL surface, producing texture maps in the SMPL UV domain. Then we construct pairs of human poses and texture maps to establish a texture vocabulary for encoding dynamic human appearances under various poses. Unlike the commonly used joint-wise manner, we further design a body-part-wise encoding strategy to learn the structural effects of the kinematic chain. Given a driving pose, we query the pose feature hierarchically by decomposing the pose vector into several body parts and interpolating the texture features for synthesizing fine-grained human dynamics. Overall, our method is able to create animatable human avatars with detailed and dynamic appearances from RGB videos, and the experiments show that our method outperforms state-of-the-art approaches. The project page can be found at https://texvocab.github.io/.
- Pbns: physically based neural simulator for unsupervised garment pose space deformation. arXiv preprint arXiv:2012.11310, 2020.
- Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34(4):18–42, 2017.
- Dynamic surface function networks for clothed human bodies. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10754–10764, 2021.
- Efficient geometry-aware 3d generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16123–16133, 2022.
- Tensorf: Tensorial radiance fields. In European Conference on Computer Vision, pages 333–350. Springer, 2022.
- Snarf: Differentiable forward skinning for animating non-rigid neural implicit shapes. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11594–11604, 2021.
- Learning implicit fields for generative shape modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5939–5948, 2019.
- Totalselfscan: Learning full-body avatars from self-portrait videos of faces, hands, and bodies. Advances in Neural Information Processing Systems, 35:13654–13667, 2022a.
- Pina: Learning a personalized implicit neural avatar from a single rgb-d video sequence. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20470–20480, 2022b.
- Dynamic neural radiance fields for monocular 4d facial avatar reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8649–8658, 2021.
- Dynamic view synthesis from dynamic monocular video. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5712–5721, 2021.
- Fastnerf: High-fidelity neural rendering at 200fps. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14346–14355, 2021.
- Stylepeople: A generative model of fullbody human avatars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5151–5160, 2021.
- 3d mesh labeling via deep convolutional neural networks. ACM Transactions on Graphics (TOG), 35(1):1–12, 2015.
- Deepcap: Monocular human performance capture using weak supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5052–5063, 2020.
- Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
- Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
- Tri-miprf: Tri-mip representation for efficient anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 19774–19783, 2023.
- Du Q Huynh. Metrics for 3d rotations: Comparison and analysis. Journal of Mathematical Imaging and Vision, 35:155–164, 2009.
- Humanrf: High-fidelity neural radiance fields for humans in motion. arXiv preprint arXiv:2305.06356, 2023.
- Selfrecon: Self reconstruction your digital avatar from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5605–5615, 2022.
- Instantavatar: Learning avatars from monocular video in 60 seconds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16922–16932, 2023.
- Ai choreographer: Music conditioned 3d dance generation with aist++. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 13401–13412, 2021.
- Tava: Template-free animatable volumetric actors. In European Conference on Computer Vision, pages 419–436. Springer, 2022a.
- Avatarcap: Animatable avatar conditioned monocular human volumetric capture. In European Conference on Computer Vision, pages 322–341. Springer, 2022b.
- Posevocab: Learning joint-structured pose embeddings for human avatar modeling. arXiv preprint arXiv:2304.13006, 2023.
- Learning implicit templates for point-based clothed human modeling. In European Conference on Computer Vision, pages 210–228. Springer, 2022.
- Neural sparse voxel fields. Advances in Neural Information Processing Systems, 33:15651–15663, 2020.
- Neural actor: Neural free-view synthesis of human actors with pose control. ACM transactions on graphics (TOG), 40(6):1–16, 2021.
- Neural volumes: Learning dynamic renderable volumes from images. arXiv preprint arXiv:1906.07751, 2019.
- Smpl: A skinned multi-person linear model. Acm Transactions on Graphics, 34(6cd):248, 2015.
- Scale: Modeling clothed humans with a surface codec of articulated local elements. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16082–16093, 2021a.
- The power of points for modeling humans in clothing. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10974–10984, 2021b.
- Lookingood: Enhancing performance capture with real-time neural re-rendering. arXiv preprint arXiv:1811.05029, 2018.
- Voxnet: A 3d convolutional neural network for real-time object recognition. In 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 922–928. IEEE, 2015.
- Occupancy networks: Learning 3d reconstruction in function space. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4460–4470, 2019.
- Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
- Instant neural graphics primitives with a multiresolution hash encoding. ACM Transactions on Graphics (ToG), 41(4):1–15, 2022.
- Predicting loose-fitting garment deformations using bone-driven motion networks. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–10, 2022.
- Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 165–174, 2019.
- Nerfies: Deformable neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5865–5874, 2021.
- Tailornet: Predicting clothing in 3d as a function of human pose, shape and garment style. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 7365–7375, 2020.
- Expressive body capture: 3d hands, face, and body from a single image. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10975–10985, 2019.
- Animatable neural radiance fields for modeling dynamic human bodies. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14314–14323, 2021a.
- Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9054–9063, 2021b.
- Volumetric and multi-view cnns for object classification on 3d data. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5648–5656, 2016.
- Pointnet: Deep learning on point sets for 3d classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660, 2017a.
- Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Advances in neural information processing systems, 30, 2017b.
- Drivable volumetric avatars using texel-aligned features. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–9, 2022.
- Scanimate: Weakly supervised learning of skinned clothed avatar networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2886–2897, 2021.
- Learning-based animation of clothing for virtual try-on. In Computer Graphics Forum, pages 355–366. Wiley Online Library, 2019.
- Snug: Self-supervised neural dynamic garments. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8140–8150, 2022.
- Tensor4d: Efficient neural 4d decomposition for high-fidelity dynamic reconstruction and rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16632–16642, 2023.
- Scene representation networks: Continuous 3d-structure-aware neural scene representations. Advances in Neural Information Processing Systems, 32, 2019.
- Deep sliding shapes for amodal 3d object detection in rgb-d images. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 808–816, 2016.
- Caphy: Capturing physical properties for animatable human avatars. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 14150–14160, 2023.
- Fourier features let networks learn high frequency functions in low dimensional domains. Advances in Neural Information Processing Systems, 33:7537–7547, 2020.
- Neural capture of animatable 3d human from monocular video. In European Conference on Computer Vision, pages 275–291. Springer, 2022.
- Deferred neural rendering: Image synthesis using neural textures. Acm Transactions on Graphics (TOG), 38(4):1–12, 2019.
- Non-rigid neural radiance fields: Reconstruction and novel view synthesis of a dynamic scene from monocular video. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12959–12970, 2021.
- Fully convolutional graph neural networks for parametric virtual try-on. In Computer Graphics Forum, pages 145–156. Wiley Online Library, 2020.
- 3d shape segmentation via shape fully convolutional networks. Computers & Graphics, 76:182–192, 2018a.
- Metaavatar: Learning animatable clothed human models from few depth images. Advances in Neural Information Processing Systems, 34:2810–2822, 2021.
- Arah: Animatable volume rendering of articulated human sdfs. In European conference on computer vision, pages 1–19. Springer, 2022.
- Video-to-video synthesis. arXiv preprint arXiv:1808.06601, 2018b.
- Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing, 13(4):600–612, 2004.
- Volume rendering of neural implicit surfaces. Advances in Neural Information Processing Systems, 34:4805–4815, 2021.
- Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5752–5761, 2021a.
- pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4578–4587, 2021b.
- The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
- Structured local radiance fields for human avatar modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15893–15903, 2022.
- Avatarrex: Real-time expressive full-body avatars. arXiv preprint arXiv:2305.04789, 2023.
- Sparsefusion: Distilling view-conditioned diffusion for 3d reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12588–12597, 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.