Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
133 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors (2403.03122v2)

Published 5 Mar 2024 in cs.CV

Abstract: Faithfully modeling the space of articulations is a crucial task that allows recovery and generation of realistic poses, and remains a notorious challenge. To this end, we introduce Neural Riemannian Distance Fields (NRDFs), data-driven priors modeling the space of plausible articulations, represented as the zero-level-set of a neural field in a high-dimensional product-quaternion space. To train NRDFs only on positive examples, we introduce a new sampling algorithm, ensuring that the geodesic distances follow a desired distribution, yielding a principled distance field learning paradigm. We then devise a projection algorithm to map any random pose onto the level-set by an adaptive-step Riemannian optimizer, adhering to the product manifold of joint rotations at all times. NRDFs can compute the Riemannian gradient via backpropagation and by mathematical analogy, are related to Riemannian flow matching, a recent generative model. We conduct a comprehensive evaluation of NRDF against other pose priors in various downstream tasks, i.e., pose generation, image-based pose estimation, and solving inverse kinematics, highlighting NRDF's superior performance. Besides humans, NRDF's versatility extends to hand and animal poses, as it can effectively represent any articulation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. Accelerated line-search and trust-region methods. SIAM Journal on Numerical Analysis, 47(2):997–1018, 2009.
  2. Pose-conditioned joint angle limits for 3D human pose reconstruction. In CVPR, 2015.
  3. A stochastic conditioning scheme for diverse human motion prediction. In CVPR, 2020.
  4. Learning to reconstruct people in clothing from a single RGB camera. In CVPR, 2019.
  5. Jesus Angulo. Riemannian l p averaging on lie group of nonzero quaternions. Advances in Applied Clifford Algebras, 24(2):355–382, 2014.
  6. Parametrization and range of motion of the ball-and-socket joint. In Proceedings of the IFIP TC5/WG5.10 DEFORM’2000 Workshop and AVATARS’2000 Workshop on Deformable Avatars, 2000.
  7. HP-GAN: probabilistic 3D human motion prediction via gan. In CVPR Workshops, 2018.
  8. Multi-garment net: Learning to dress 3D people from images. In ICCV, 2019.
  9. BEHAVE: Dataset and method for tracking human object interactions. In CVPR, 2022.
  10. Probabilistic permutation synchronization using the riemannian structure of the birkhoff polytope. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11105–11116, 2019.
  11. Bayesian pose graph optimization via bingham distributions and tempered geodesic mcmc. Advances in Neural Information Processing Systems, 31, 2018.
  12. Synchronizing probability measures on rotations via optimal transport. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1569–1579, 2020.
  13. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In ECCV, 2016.
  14. Dynamic FAUST: Registering human bodies in motion. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2017.
  15. Silvere Bonnabel. Stochastic gradient descent on riemannian manifolds. IEEE Transactions on Automatic Control, 58(9):2217–2229, 2013.
  16. Nicolas Boumal. An introduction to optimization on smooth manifolds. Available online, May, 2020.
  17. Learning gradient fields for shape generation. In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
  18. Smpler-x: Scaling up expressive human pose and shape estimation. arXiv preprint arXiv:2309.17448, 2023.
  19. Learning a 3d human pose distance metric from geometric pose descriptor. IEEE transactions on visualization and computer graphics, 17, 2010.
  20. Projective manifold gradient layer for deep rotation regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6646–6655, 2022.
  21. Riemannian flow matching on general geometries. arXiv preprint arXiv:2302.03660, 2023.
  22. Neural ordinary differential equations. Advances in neural information processing systems, 31, 2018.
  23. Implicit functions in feature space for 3D shape reconstruction and completion. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020a.
  24. Neural unsigned distance fields for implicit function learning. In Advances in Neural Information Processing Systems (NeurIPS), 2020b.
  25. Gfpose: Learning 3d human pose prior with gradient fields. arXiv preprint arXiv:2212.08641, 2022.
  26. Mofusion: A framework for denoising-diffusion-based motion synthesis. In Computer Vision and Pattern Recognition (CVPR), 2023.
  27. Adversarial parametric pose prior. In CVPR, 2022.
  28. Riemannian geometry. Springer, 1992.
  29. A joint-constraint model for human joints using signed distance-fields. Multibody System Dynamics, 28, 2012.
  30. Reparameterizing distributions on lie groups. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 3244–3253. PMLR, 2019.
  31. DART: Articulated Hand Model with Diverse Accessories and Rich Textures. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
  32. Bayesian image analysis. In Disordered systems and biological organization, pages 301–319. Springer, 1986.
  33. Human POSEitioning System (HPS): 3D human pose estimation and self-localization in large scenes from body-mounted sensors. In CVPR, 2021.
  34. Normalizing flows for human pose anomaly detection, 2023.
  35. Denoising diffusion probabilistic models, 2020.
  36. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7):1325–1339, 2014.
  37. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547, 2019.
  38. End-to-end recovery of human shape and pose. In CVPR, 2018.
  39. Normalizing flows: An introduction and review of current methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(11):3964–3979, 2021.
  40. Vibe: Video inference for human body pose and shape estimation. In CVPR, 2020.
  41. Geoopt: Riemannian optimization in pytorch, 2020.
  42. Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In ICCV, 2019.
  43. Fast Local and Global Similarity Searches in Large Motion Capture Databases. In Eurographics/ ACM SIGGRAPH Symposium on Computer Animation. The Eurographics Association, 2010.
  44. Flow matching for generative modeling. In The Eleventh International Conference on Learning Representations, 2022.
  45. AMASS: Archive of motion capture as surface shapes. In ICCV, 2019.
  46. Pertti Mattila. Geometry of sets and measures in Euclidean spaces: fractals and rectifiability. Number 44. Cambridge university press, 1999.
  47. Julien Munier. Steepest descent method on a riemannian manifold: the convex case. Balkan Journal of Geometry & Its Applications, 12(2), 2007.
  48. Neural body fitting: Unifying deep learning and model based human pose and shape estimation. In 3DV, 2018.
  49. Learning and tracking cyclic human motion. Advances in Neural Information Processing Systems, 13, 2000.
  50. DeepSDF: Learning continuous signed distance functions for shape representation. In CVPR, 2019.
  51. Expressive body capture: 3D hands, face, and body from a single image. In CVPR, 2019.
  52. Action-conditioned 3D human motion synthesis with transformer VAE. In ICCV, 2021.
  53. HuMoR: 3D human motion model for robust pose estimation. In ICCV, 2021.
  54. Embodied hands: Modeling and capturing hands and bodies together. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 36(6), 2017.
  55. Improved techniques for training gans. In Advances in Neural Information Processing Systems. Curran Associates, Inc., 2016.
  56. A general joint component framework for realistic articulation in human characters. In Proceedings of the 2003 symposium on Interactive 3D graphics, pages 11–18, 2003.
  57. Stochastic tracking of 3D human figures using 2D image motion. In ECCV, 2000.
  58. Generative modeling by estimating gradients of the data distribution. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Red Hook, NY, USA, 2019. Curran Associates Inc.
  59. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021.
  60. Video-based reconstruction of animatable human characters. In ACM SIGGRAPH Asia, 2010.
  61. Human motion diffusion model. In The Eleventh International Conference on Learning Representations, 2023.
  62. Neural-GIF: Neural generalized implicit functions for animating people in clothing. In ICCV, 2021.
  63. Pose-ndf: Modeling human pose manifolds with neural distance fields. In European Conference on Computer Vision (ECCV). Springer, 2022.
  64. PyManopt: a Python toolbox for optimization on manifolds using automatic differentiation. Journal of Machine Learning Research, 17(137):1–5, 2016.
  65. Averaging stochastic gradient descent on riemannian manifolds. In Conference On Learning Theory, pages 650–687. PMLR, 2018.
  66. 3D people tracking with Gaussian process dynamical models. In CVPR, 2006.
  67. Recovering accurate 3d human pose in the wild using imus and a moving camera. In European Conference on Computer Vision (ECCV), 2018.
  68. Probabilistic monocular 3d human pose estimation with normalizing flows. In International Conference on Computer Vision (ICCV), 2021.
  69. Animal3d: A comprehensive dataset of 3d animal pose and shape. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9099–9109, 2023.
  70. First-order methods for geodesically convex optimization. In Conference on Learning Theory, pages 1617–1638. PMLR, 2016.
  71. Learning motion priors for 4D human body capture in 3D scenes. In ICCV, 2021.
  72. 3d menagerie: Modeling the 3d shape and pose of animals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
  73. Lions and tigers and bears: Capturing non-rigid, 3d, articulated shape from images. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 3955–3963, 2018.
Citations (5)

Summary

  • The paper presents a neural framework leveraging Riemannian distance fields to model realistic human pose manifolds.
  • It introduces an adaptive-step Riemannian optimizer that efficiently projects arbitrary articulations onto the learned pose manifold.
  • It details a novel sampling method for generating training data that captures well-defined articulated shapes for robust pose estimation.

Neural Riemannian Distance Fields for Learning Articulated Pose Priors

Overview

State-of-the-art articulated pose estimation and generation continue to present significant challenges in computer vision and graphics, largely due to the complexity of modeling the high-dimensional space of realistic human poses. This paper introduces Neural Riemannian Distance Fields (\newmodel{}s), a novel approach to modeling the manifold of plausible human articulations using neural fields within a high-dimensional product-quaternion space. At its core, \newmodel{} represents a methodological advancement in learning data-driven priors for articulated shapes, offering a robust framework for a wide array of applications, including pose generation, inverse kinematics, and human pose estimation from images.

Methodology

The key contributions of this paper include the introduction of a principled framework for learning Neural Distance Fields (NDFs) on Riemannian manifolds, an adaptive-step Riemannian gradient descent algorithm for efficient projection onto the pose manifold, and a novel sampling method crucial for effective pose manifold learning. Together, these components form the foundation of \newmodel{}, allowing it to model the space of realistic human poses effectively.

Neural Riemannian Distance Fields

\newmodel{}s are learned by training a hierarchical network to predict the geodesic distance to the nearest realistic pose within a given dataset. This training process is underpinned by a new sampling method on Riemannian manifolds, enabling explicit control over the resulting distance distribution of training examples. Crucially, the network predicts distances in a high-dimensional product-quaternion space, adhering closely to the geometric structure of human articulations.

Adaptive-Step Riemannian Optimizer

To project arbitrary articulations onto the learned pose manifold, the paper introduces an adaptive-step Riemannian optimizer, \RDFGrad{}. This optimization algorithm leverages the Riemannian structure of the pose space, ensuring that projections strictly adhere to the manifold of joint rotations. This approach marks a significant advancement over previous methods, accelerating convergence and enhancing the fidelity of projected poses.

Sampling for Training Data

The paper also details a novel method for generating training data, crucial for effectively learning the pose manifold. By controlling the distribution of distances in the generated training examples, the authors ensure that the learned \newmodel{} can capture detailed and well-defined articulations. This contrasts sharply with previous heuristics, which often lead to poorly defined manifolds.

Implications and Future Developments

The introduction of \newmodel{}s has broad implications for the field of AI and computer vision. By providing a robust method for learning detailed models of human pose manifolds, this research opens new avenues for realistic pose generation, accurate inverse kinematics solutions, and improved human pose estimation from images. Moreover, the versatility of \newmodel{}s extends beyond humans to modeling articulations of hands and animals, demonstrating its wide applicability.

Looking forward, the authors suggest several promising directions for further research. These include exploring noise injection during projection to enhance pose diversity, modeling manifold uncertainty, and extending the methodology to other complex articulated shapes. As the field continues to advance, \newmodel{}s offer a powerful tool for pushing the boundaries of what is possible in understanding and replicating human motion.