Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

VINECS: Video-based Neural Character Skinning (2307.00842v1)

Published 3 Jul 2023 in cs.CV

Abstract: Rigging and skinning clothed human avatars is a challenging task and traditionally requires a lot of manual work and expertise. Recent methods addressing it either generalize across different characters or focus on capturing the dynamics of a single character observed under different pose configurations. However, the former methods typically predict solely static skinning weights, which perform poorly for highly articulated poses, and the latter ones either require dense 3D character scans in different poses or cannot generate an explicit mesh with vertex correspondence over time. To address these challenges, we propose a fully automated approach for creating a fully rigged character with pose-dependent skinning weights, which can be solely learned from multi-view video. Therefore, we first acquire a rigged template, which is then statically skinned. Next, a coordinate-based MLP learns a skinning weights field parameterized over the position in a canonical pose space and the respective pose. Moreover, we introduce our pose- and view-dependent appearance field allowing us to differentiably render and supervise the posed mesh using multi-view imagery. We show that our approach outperforms state-of-the-art while not relying on dense 4D scans.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
  2. Agisoft. PhotoScan. http://www.agisoft.com, 2016.
  3. Articulated body deformation from range scan data. ACM Trans. Graph., 21(3):612–619, jul 2002.
  4. Learning a Correlated Model of Identity and Pose-Dependent Body Shape Variation for Real-Time Synthesis. In Marie-Paule Cani and James O’Brien, editors, ACM SIGGRAPH / Eurographics Symposium on Computer Animation. The Eurographics Association, 2006.
  5. SCAPE: Shape Completion and Animation of People. ACM Transactions on Graphics, 24(3):408–416, 2005.
  6. Spline interface for intuitive skinning weight editing. ACM Trans. Graph., 37(5), sep 2018.
  7. Automatic rigging and animation of 3d characters. ACM Trans. Graph., 26(3), July 2007.
  8. Snarf: Differentiable forward skinning for animating non-rigid neural implicit shapes. In International Conference on Computer Vision (ICCV), 2021.
  9. Olivier Dionne and Martin de Lasa. Geodesic voxel binding for production character meshes. In Proceedings of the 12th ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA ’13, page 173–180, New York, NY, USA, 2013. Association for Computing Machinery.
  10. Olivier Dionne and Martin de Lasa. Geodesic binding for degenerate character geometry using sparse voxelization. IEEE Transactions on Visualization and Computer Graphics, 20(10):1367–1378, 2014.
  11. Deformation styles for spline-based skeletal animation. In Proceedings of the 2007 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA ’07, page 141–150, Goslar, DEU, 2007. Eurographics Association.
  12. Implicit geometric regularization for learning shapes. In Proceedings of Machine Learning and Systems 2020, pages 3569–3579. 2020.
  13. Real-time deep dynamic characters. ACM Trans. Graph., 40(4), jul 2021.
  14. Deepcap: Monocular human performance capture using weak supervision. Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 1:1, 2020.
  15. Livecap: Real-time human performance capture from monocular video. ACM Transactions on Graphics (TOG), 38(2):14:1–14:17, 2019.
  16. Sweep-based human deformation. The Visual Computer, 21:542–550, 2005.
  17. Bounded biharmonic weights for real-time deformation. Commun. ACM, 57(4):99–106, apr 2014.
  18. Skinning mesh animations. ACM Trans. Graph., 24(3):399–407, jul 2005.
  19. Hifecap: Monocular high-fidelity and expressive capture of human performances. In BMVC, 2022.
  20. Skinning with dual quaternions. In Proceedings of the 2007 symposium on Interactive 3D graphics and games, pages 39–46. ACM, 2007.
  21. Elasticity-inspired deformers for character articulation. ACM Trans. Graph., 31(6), nov 2012.
  22. Spherical blend skinning: A real-time deformation of articulated models. In Proceedings of the 2005 Symposium on Interactive 3D Graphics and Games, I3D ’05, page 9–16, New York, NY, USA, 2005. Association for Computing Machinery.
  23. Adam: A method for stochastic optimization. International Conference on Learning Representations, 12 2014.
  24. Robust and accurate skeletal rigging from mesh sequences. ACM Trans. Graph., 33(4), jul 2014.
  25. Real-time skeletal skinning with optimized centers of rotation. ACM Trans. Graph., 35(4), jul 2016.
  26. Pose space deformation: A unified approach to shape interpolation and skeleton-driven deformation. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, SIGGRAPH ’00, page 165–172, USA, 2000. ACM Press/Addison-Wesley Publishing Co.
  27. Learning skeletal articulations with neural blend shapes. ACM Transactions on Graphics (TOG), 40(4):1, 2021.
  28. Self-correction for human parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
  29. Tava: Template-free animatable volumetric actors. 2022.
  30. Deep physics-aware inference of cloth deformation for monocular human performance capture. 2020.
  31. Skeleton-free pose transfer for stylized 3d characters. In European Conference on Computer Vision (ECCV). Springer, October 2022.
  32. Neural actor: Neural free-view synthesis of human actors with pose control. ACM Trans. Graph., 40(6), dec 2021.
  33. Neuroskinning: Automatic skin binding for production characters with deep graph networks. ACM Trans. Graph., 38(4), jul 2019.
  34. SMPL: A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, Oct. 2015.
  35. Marching cubes: A high resolution 3d surface construction algorithm. ACM SIGGRAPH Computer Graphics, 21:163–, 08 1987.
  36. Joint-dependent local deformations for hand animation and object grasping. In Proceedings of Graphics Interface ’88, GI ’88, pages 26–33. Canadian Man-Computer Communications Society, 1988.
  37. Nerf: Representing scenes as neural radiance fields for view synthesis. In Andrea Vedaldi, Horst Bischof, Thomas Brox, and Jan-Michael Frahm, editors, Computer Vision – ECCV 2020, pages 405–421, Cham, 2020. Springer International Publishing.
  38. Building efficient, accurate character skins from examples. ACM Trans. Graph., 22(3):562–568, jul 2003.
  39. Skinningnet: Two-stream graph convolutional neural network for skinning prediction of synthetic characters. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 18572–18581, Los Alamitos, CA, USA, jun 2022. IEEE Computer Society.
  40. Efficient dynamic skinning with low-rank helper bone controllers. ACM Trans. Graph., 35(4), jul 2016.
  41. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph., 41(4):102:1–102:15, July 2022.
  42. Deepsdf: Learning continuous signed distance functions for shape representation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019.
  43. Capturing and animating skin deformation in human motion. ACM Trans. Graph., 25(3):881–889, jul 2006.
  44. Animatable neural radiance fields for human body modeling. ICCV, 2021.
  45. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. CVPR, 1(1):9054–9063, 2021.
  46. SCANimate: Weakly supervised learning of skinned clothed avatar networks. In Proceedings IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), June 2021.
  47. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. Advances in neural information processing systems, 29, 2016.
  48. Background matting: The world is your green screen. In Computer Vision and Pattern Regognition (CVPR), 2020.
  49. Shape by example. In Proceedings of the 2001 Symposium on Interactive 3D Graphics, I3D ’01, pages 135–143, New York, NY, USA, 2001. Association for Computing Machinery.
  50. TheCaptury. The Captury. http://www.thecaptury.com/, 2020.
  51. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods, 17:261–272, 2020.
  52. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. NeurIPS, 2021.
  53. Arah: Animatable volume rendering of articulated human sdfs. In Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII, page 1–19, Berlin, Heidelberg, 2022. Springer-Verlag.
  54. Multi-weight enveloping: Least-squares approximation techniques for skin animation. In Proceedings of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA ’02, page 129–138, New York, NY, USA, 2002. Association for Computing Machinery.
  55. Bone glow: An improved method for the assignment of weights for mesh deformation. In Francisco J. Perales and Robert B. Fisher, editors, Articulated Motion and Deformable Objects, pages 63–71, Berlin, Heidelberg, 2008. Springer Berlin Heidelberg.
  56. Rignet: Neural rigging for articulated characters. ACM Trans. on Graphics, 39, 2020.
  57. Object wake-up: 3d object rigging from a single image. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part II, pages 311–327. Springer, 2022.
  58. Curve skeleton skinning for human and creature characters. Computer Animation and Virtual Worlds, 17, 2006.
Citations (1)

Summary

  • The paper introduces an end-to-end trainable system that automates rigging and skinning of 3D human avatars directly from multi-view video inputs.
  • It leverages a coordinate-based MLP to compute pose-dependent skinning weights, enabling realistic deformations without dense 3D scans.
  • The approach outperforms state-of-the-art methods by reducing reconstruction errors and effectively handling challenging dynamic poses and loose clothing.

Insights on Video-based Neural Character Skinning (VINECS)

The paper entitled "VINECS: Video-based Neural Character Skinning" presents an innovative approach to automating the rigging and skinning of 3D human avatars directly from multi-view video data. This is a significant advancement in the field of computer graphics and vision, where manual rigging and skinning are typically labor-intensive and require substantial expertise. Traditional methods often fail to accommodate dynamic and highly articulated poses due to reliance on static skinning weights or dense scans of 3D characters in different configurations. VINECS addresses these limitations through a novel methodology that leverages multi-view video to create fully rigged characters with pose-dependent skinning weights.

Technical Contributions

VINECS introduces a coordinate-based multi-layer perceptron (MLP) model to learn skinning weights that vary with pose, enabling the generation of realistic deformations in character animations. This is achieved without the necessity for dense 3D scans or manual adjustments, making the process more accessible and efficient. Key contributions of the paper include:

  1. End-to-End Trainable System: The system can generate animation-ready explicit character meshes directly from video inputs, incorporating both rigging and pose-dependent skinning.
  2. Pose-Dependent Skinning Formulation: The use of an MLP allows for continuous sampling of skinning weights across the 3D canonical space, facilitating robust multi-resolution character skinning.
  3. Differentiable Rendering with Supervision: The approach incorporates a unique appearance model that provides pose- and view-dependent rendering, enhancing weak supervision using silhouette and rendering losses.

Results and Comparative Analysis

VINECS has been evaluated against existing state-of-the-art methods such as SCANimate and SNARF, which traditionally require dense point cloud data for training. The proposed method demonstrates superior performance in achieving lower reconstruction errors (measured in Chamfer distance and Hausdorff distance) across multiple test subjects, even when compared solely on multi-view video data. Importantly, VINECS illustrates improved accuracy over SCANimate in subjects with loose clothing, highlighting its robustness to a variety of clothing types.

Discussion of Implications

The practical implications of VINECS are vast, as it streamlines the creation of animatable 3D characters from easily obtainable video inputs. This can significantly lower barriers for industries relying on character animations, such as game development, film production, and virtual reality applications. From a theoretical standpoint, VINECS contributes to the ongoing dialogue about the capabilities and extensions of neural networks in graphics, particularly in generating realistic, articulate human figures solely from visual data.

Future Developments

Future exploration might extend VINECS by integrating facial expression modeling or optimizing computational efficiency using advanced neural network architectures like hashgrids. Additionally, further research could focus on simultaneous rigging, skinning, and pose tracking, enhancing the inflow of high-quality animation-ready representations of diverse character models.

To conclude, the contributions of "VINECS: Video-based Neural Character Skinning" not only enhance the automation of the rigging and skinning process but also provide a foundation for future explorations in creating dynamic 3D character animations using neural methods that operate directly on multi-view video inputs, marking a substantive step forward in computer-generated character production.

X Twitter Logo Streamline Icon: https://streamlinehq.com