Papers
Topics
Authors
Recent
2000 character limit reached

Reconstructing Close Human Interactions from Multiple Views (2401.16173v1)

Published 29 Jan 2024 in cs.CV

Abstract: This paper addresses the challenging task of reconstructing the poses of multiple individuals engaged in close interactions, captured by multiple calibrated cameras. The difficulty arises from the noisy or false 2D keypoint detections due to inter-person occlusion, the heavy ambiguity in associating keypoints to individuals due to the close interactions, and the scarcity of training data as collecting and annotating motion data in crowded scenes is resource-intensive. We introduce a novel system to address these challenges. Our system integrates a learning-based pose estimation component and its corresponding training and inference strategies. The pose estimation component takes multi-view 2D keypoint heatmaps as input and reconstructs the pose of each individual using a 3D conditional volumetric network. As the network doesn't need images as input, we can leverage known camera parameters from test scenes and a large quantity of existing motion capture data to synthesize massive training data that mimics the real data distribution in test scenes. Extensive experiments demonstrate that our approach significantly surpasses previous approaches in terms of pose accuracy and is generalizable across various camera setups and population sizes. The code is available on our project page: https://github.com/zju3dv/CloseMoCap.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (74)
  1. Socially and contextually aware human motion and pose forecasting. IEEE Robotics and Automation Letters 5, 4 (2020), 6033–6040.
  2. HSPACE: Synthetic parametric humans animated in complex environments. arXiv preprint arXiv:2112.12867 (2021).
  3. 3D pictorial structures for multiple human pose estimation. In CVPR. 1669–1676.
  4. 3d pictorial structures revisited: Multiple human pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 10 (2015), 1929–1942.
  5. Pandanet: Anchor-based single-shot multi-person 3d pose estimation. In CVPR. 6856–6865.
  6. BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion. In CVPR. 8726–8737.
  7. HuMMan: Multi-modal 4d human dataset for versatile sensing and modeling. In ECCV. Springer, 557–577.
  8. Realtime multi-person 2d pose estimation using part affinity fields. In CVPR. 7291–7299.
  9. Multi-Person 3D Pose and Shape Estimation via Inverse Kinematics and Refinement. In ECCV. Springer, 660–677.
  10. HUMAN4D: A Human-Centric Multimodal Dataset for Motions and Immersive Media. IEEE Access 8 (2020), 176241–176262.
  11. Unsupervised 3d pose estimation with geometric self-supervision. In CVPR. 5714–5724.
  12. CMU Graphics Lab. 2000. CMU Graphics Lab Motion Capture Database. http://mocap.cs.cmu.edu/.
  13. Fast and robust multi-person 3d pose estimation from multiple views. In CVPR. 7792–7801.
  14. Can 3d pose be learned from 2d projections alone?. In ECCVW. 78–94.
  15. Three-dimensional reconstruction of human interactions. In CVPR. 7214–7223.
  16. Learning complex 3d human self-contact. In AAAI. 1343–1351.
  17. AIFit: Automatic 3D Human-Interpretable Feedback Models for Fitness Training. In CVPR. 9919–9928.
  18. Remips: Physically consistent 3d reconstruction of multiple interacting people under weak supervision. NeurIPS 34 (2021), 19385–19397.
  19. MoVi: A Large Multipurpose Motion and Video Dataset. Borealis. https://doi.org/10.5683/SP2/JRHDRN
  20. Multi-person extreme motion prediction. In CVPR. 13053–13064.
  21. End-to-end dynamic matching network for multi-view multi-person 3d pose estimation. In ECCV. Springer, 477–493.
  22. Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (2014), 1325–1339.
  23. Learnable triangulation of human pose. In ICCV. 7718–7727.
  24. Glenn Jocher. 2020. Ultralytics YOLOv5. https://doi.org/10.5281/zenodo.3908559
  25. Panoptic studio: A massively multiview system for social motion capture. In ICCV. 3334–3342.
  26. Exemplar fine-tuning for 3d human model fitting towards in-the-wild 3d human pose estimation. In 3DV. IEEE, 42–52.
  27. Dyadic human motion prediction. arXiv preprint arXiv:2112.00396 (2021).
  28. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.
  29. Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In CVPR. 2252–2261.
  30. Jiahao Lin and Gim Hee Lee. 2020. Hdnet: Human depth estimation for multi-person camera-space localization. In ECCV. Springer, 633–648.
  31. Jiahao Lin and Gim Hee Lee. 2021. Multi-view multi-person 3d pose estimation with plane sweep stereo. In CVPR. 11886–11895.
  32. Focal loss for dense object detection. In ICCV. 2980–2988.
  33. Microsoft coco: Common objects in context. In ECCV. Springer, 740–755.
  34. Neural actor: Neural free-view synthesis of human actors with pose control. ACM Transactions on Graphics 40, 6 (2021), 16 pages.
  35. Explicit Occlusion Reasoning for Multi-person 3D Human Pose Estimation. In ECCV. Springer, 497–517.
  36. SMPL: A Skinned Multi-Person Linear Model. ACM Transactions on Graphics 34, 6 (Nov 2015), 16 pages.
  37. AMASS: Archive of motion capture as surface shapes. In ICCV. 5442–5451.
  38. Monocular 3D Human Pose Estimation In The Wild Using Improved CNN Supervision. In 3DV. IEEE. http://gvv.mpi-inf.mpg.de/3dhp_dataset
  39. XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera. ACM Transactions on Graphics 39, 4 (July 2020), 17 pages.
  40. Single-shot multi-person 3d pose estimation from monocular rgb. In 3DV. 120–130.
  41. KeypointNeRF: Generalizing image-based volumetric avatars using relative spatial encoding of keypoints. In ECCV. Springer, 179–197.
  42. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 65, 1 (2021), 99–106.
  43. Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image. In ICCV. 10133–10142.
  44. Berkeley mhad: A comprehensive multimodal human action database. In WACV. 53–60.
  45. AGORA: Avatars in geography optimized for regression analysis. In CVPR. 13468–13478.
  46. Expressive Body Capture: 3D Hands, Face, and Body from a Single Image. In CVPR. 10975–10985.
  47. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In CVPR. 9054–9063.
  48. PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers. In CVPR.
  49. Humor: 3d human motion model for robust pose estimation. In ICCV. 11488–11499.
  50. Civilian American and European surface anthropometry resource (CAESAR), final report, volume I: Summary. Sytronics Inc Dayton Oh (2002).
  51. Novel view synthesis of human interactions from sparse multi-view videos. In SIGGRAPH. 1–10.
  52. Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International Journal of Computer Vision 87, 1-2 (2010), 4.
  53. VirtualPose: Learning Generalizable 3D Human Pose Models from Virtual Data. In ECCV. Springer, 55–71.
  54. Putting people in their place: Monocular regression of 3d people in depth. In CVPR. 13243–13252.
  55. Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors. In BMVC.
  56. Voxelpose: Towards multi-camera 3d human pose estimation in wild environment. In ECCV. Springer, 197–212.
  57. Learning from synthetic humans. In CVPR. 109–117.
  58. Recovering accurate 3d human pose in the wild using imus and a moving camera. In ECCV. 601–617.
  59. CanonPose: Self-Supervised Monocular 3D Human Pose Estimation in the Wild. In CVPR.
  60. Hmor: Hierarchical multi-person ordinal relations for monocular multi-person 3d pose estimation. In ECCV. Springer, 242–259.
  61. Deep high-resolution representation learning for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 10 (2020), 3349–3364.
  62. Direct Multi-view Multi-person 3D Human Pose Estimation. NeurIPS 34 (2021), 13153–13164.
  63. Distribution-aware single-stage models for multi-person 3D pose estimation. In CVPR. 13096–13105.
  64. Humannerf: Free-viewpoint rendering of moving people from monocular video. In CVPR. 16210–16220.
  65. Graph-based 3d multi-person pose estimation using multi-view images. In ICCV. 11148–11157.
  66. GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models. In CVPR. 6184–6193.
  67. Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection. In ECCV. Springer, 142–159.
  68. Decoupling Human and Camera Motion from Videos in the Wild. In CVPR.
  69. Hi4D: 4D Instance Segmentation of Close Human Interaction. In CVPR. 17016–17027.
  70. Humbi: A large multiview dataset of human body expressions and benchmark challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 1 (2021), 623–640.
  71. GLAMR: Global occlusion-aware human mesh recovery with dynamic cameras. In CVPR. 11038–11049.
  72. 4D association graph for realtime multi-person motion capture using multiple video cameras. In CVPR. 1324–1333.
  73. Smap: Single-shot multi-person absolute 3d pose estimation. In ECCV. Springer, 550–566.
  74. QuickPose: Real-time Multi-view Multi-person Pose Estimation in Crowded Scenes. In SIGGRAPH. 1–9.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Paper to Video (Beta)

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.