Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

BEDLAM: A Synthetic Dataset of Bodies Exhibiting Detailed Lifelike Animated Motion (2306.16940v1)

Published 29 Jun 2023 in cs.CV

Abstract: We show, for the first time, that neural networks trained only on synthetic data achieve state-of-the-art accuracy on the problem of 3D human pose and shape (HPS) estimation from real images. Previous synthetic datasets have been small, unrealistic, or lacked realistic clothing. Achieving sufficient realism is non-trivial and we show how to do this for full bodies in motion. Specifically, our BEDLAM dataset contains monocular RGB videos with ground-truth 3D bodies in SMPL-X format. It includes a diversity of body shapes, motions, skin tones, hair, and clothing. The clothing is realistically simulated on the moving bodies using commercial clothing physics simulation. We render varying numbers of people in realistic scenes with varied lighting and camera motions. We then train various HPS regressors using BEDLAM and achieve state-of-the-art accuracy on real-image benchmarks despite training with synthetic data. We use BEDLAM to gain insights into what model design choices are important for accuracy. With good synthetic training data, we find that a basic method like HMR approaches the accuracy of the current SOTA method (CLIFF). BEDLAM is useful for a variety of tasks and all images, ground truth bodies, 3D clothing, support code, and more are available for research purposes. Additionally, we provide detailed information about our synthetic data generation pipeline, enabling others to generate their own datasets. See the project page: https://bedlam.is.tue.mpg.de/.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (108)
  1. Character Creator (CC), Reallusion. https://www.reallusion.com/character-creator, 2022.
  2. CLO. https://www.clo3d.com, 2022.
  3. https://meshcapade.com, 2022.
  4. Poly Haven. https://polyhaven.com/hdris, 2022.
  5. Unreal Engine 5. https://www.unrealengine.com, 2022.
  6. WowPatterns. https://www.wowpatterns.com/, 2022.
  7. UnrealEgo: A new dataset for robust egocentric 3D human motion capture. In European Conference on Computer Vision (ECCV), 2022.
  8. 2D human pose estimation: New benchmark and state of the art analysis. In Computer Vision and Pattern Recognition (CVPR), 2014.
  9. SCAPE: Shape completion and animation of people. Transactions on Graphics (TOG), 24(3):408–416, 2005.
  10. HSPACE: Synthetic parametric humans animated in complex environments. arXiv, 2112.12867, 2021.
  11. The IKEA ASM dataset: Understanding people assembling furniture through actions, objects and pose. In Winter Conference on Applications of Computer Vision (WACV), 2021.
  12. CLOTH3D: Clothed 3D humans. In European Conf. on Computer Vision (ECCV), pages 344–359. Springer International Publishing, 2020.
  13. Multi-Garment Net: Learning to dress 3D people from images. In IEEE International Conference on Computer Vision (ICCV). IEEE, oct 2019.
  14. BEHAVE: Dataset and method for tracking human object interactions. In Computer Vision and Pattern Recognition (CVPR), 2022.
  15. Albumentations: Fast and flexible image augmentations. Information, 11(2), 2020.
  16. HuMMan: Multi-modal 4D human dataset for versatile sensing and modeling. In European Conference on Computer Vision, 2022.
  17. Playing for 3d human recovery. arXiv preprint arXiv:2110.07588, 2021.
  18. Synthesizing training images for boosting human 3D pose estimation. In 2016 Fourth International Conference on 3D Vision (3DV), pages 479–488. IEEE, 2016.
  19. Accurate 3D body shape regression using metric and semantic attributes. In IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pages 2718–2728, June 2022.
  20. Monocular expressive body regression through body-driven attention. In European Conference on Computer Vision (ECCV), volume 12355, pages 20–40, 2020.
  21. DeepGarment: 3D garment shape estimation from a single image. Comput. Graph. Forum, 36(2):269–280, may 2017.
  22. ImageNet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition (CVPR), 2009.
  23. PSP-HDRI+: A synthetic dataset generator for pre-training of human-centric computer vision models. In First Workshop on Pre-training: Perspectives, Pitfalls, and Paths Forward at ICML 2022, 2022.
  24. Learning to detect and track visible and occluded body joints in a virtual world. In European Conference on Computer Vision (ECCV), 2018.
  25. Collaborative regression of expressive bodies using moderation. In International Conference on 3D Vision (3DV), pages 792–804, 2021.
  26. Moulding humans: Non-parametric 3D human shape estimation from single images. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 2232–2241, 2019.
  27. DRAPE: DRessing Any PErson. ACM Trans. on Graphics (Proc. SIGGRAPH), 31(4):35:1–35:10, July 2012.
  28. Resolving 3D human pose ambiguities with 3D scene constraints. In International Conference on Computer Vision (ICCV), pages 2282–2292, Oct. 2019.
  29. Learning to train with synthetic humans. In German Conference on Pattern Recognition (GCPR), pages 609–623, 2019.
  30. Capturing and inferring dense full-body human-scene contact. In Computer Vision and Pattern Recognition (CVPR), 2022.
  31. Human3.6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36(7):1325–1339, 2013.
  32. PoseTrack: Joint multi-person pose estimation and tracking. In Computer Vision and Pattern Recognition (CVPR), pages 4654–4663, 2017.
  33. BCNet: Learning body and cloth shape from a single image. In Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part XX, pages 18–35, 2020.
  34. Exemplar fine-tuning for 3D human pose fitting towards in-the-wild 3D human pose estimation. In International Conference on 3D Vision (3DV), pages 42–52, 2020.
  35. Panoptic Studio: A massively multiview system for social interaction capture. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 41(1):190–204, 2019.
  36. End-to-end recovery of human shape and pose. In Computer Vision and Pattern Recognition (CVPR), pages 7122–7131, 2018.
  37. PARE: Part attention regressor for 3D human body estimation. In International Conference on Computer Vision (ICCV), pages 11127–11137, 2021.
  38. SPEC: Seeing people in the wild with an estimated camera. In Proceedings International Conference on Computer Vision (ICCV), pages 11035–11045. IEEE, Oct. 2021.
  39. Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In International Conference on Computer Vision (ICCV), pages 2252–2261, 2019.
  40. HybrIK: A hybrid analytical-neural inverse kinematics solution for 3D human pose and shape estimation. In Computer Vision and Pattern Recognition (CVPR), pages 3383–3393, 2021.
  41. AI choreographer: Music conditioned 3D dance generation with AIST++. In International Conference on Computer Vision (ICCV), 2021.
  42. CLIFF: Carrying location information in full frames into human pose and shape estimation. In European Conference on Computer Vision, 2022.
  43. Shape-aware human pose and shape reconstruction using multi-view images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4352–4362, 2019.
  44. End-to-end human pose and mesh reconstruction with transformers. In Computer Vision and Pattern Recognition (CVPR), pages 1954–1963. Computer Vision Foundation / IEEE, 2021.
  45. Mesh graphormer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 12939–12948, 2021.
  46. Microsoft COCO: common objects in context. In European Conference on Computer Vision (ECCV), volume 8693, pages 740–755, 2014.
  47. Microsoft COCO: Common objects in context. In European Conference on Computer Vision (ECCV), 2014.
  48. Temporally coherent full 3D mesh human pose recovery from monocular video. arXiv preprint arXiv:1906.00161, 2019.
  49. SMPL: A skinned multi-person linear model. Transactions on Graphics (TOG), 34(6):248:1–248:16, 2015.
  50. Mediapipe: A framework for building perception pipelines. CoRR, abs/1906.08172, 2019.
  51. AMASS: Archive of motion capture as surface shapes. In International Conference on Computer Vision (ICCV), pages 5442–5451, 2019.
  52. JRDB: A dataset and benchmark of egocentric robot visual perception of humans in built environments. Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021. Early access.
  53. Monocular 3D human pose estimation in the wild using improved CNN supervision. In 3D Vision (3DV), 2017 Fifth International Conference on. IEEE, 2017.
  54. Single-shot multi-person 3D pose estimation from monocular RGB. In 3DV, 2018.
  55. Accurate 3d hand pose estimation for whole-body 3d human mesh estimation. In Computer Vision and Pattern Recognition Workshop (CVPRW), 2022.
  56. Neuralannot: Neural annotator for 3d human mesh training sets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2299–2307, 2022.
  57. On self-contact and human pose. In Computer Vision and Pattern Recognition (CVPR), pages 9990–9999, 2021.
  58. ASPset: An outdoor sports pose video dataset with 3D keypoint annotations. Image and Vision Computing, 111:104196, 2021.
  59. SUPR: A sparse unified part-based human representation. In European Conference on Computer Vision (ECCV). Springer International Publishing, Oct. 2022.
  60. Benchmarking and analyzing 3d human pose and shape estimation beyond algorithms. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
  61. Tailornet: Predicting clothing in 3d as a function of human pose, shape and garment style. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, jun 2020.
  62. AGORA: Avatars in geography optimized for regression analysis. In Computer Vision and Pattern Recognition (CVPR), pages 13468–13478, 2021.
  63. Expressive body capture: 3D hands, face, and body from a single image. In Computer Vision and Pattern Recognition (CVPR), pages 10975–10985, 2019.
  64. Learning to estimate 3d human pose and shape from a single color image. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 459–468, 2018.
  65. 3DPeople: Modeling the Geometry of Dressed Humans. In International Conference in Computer Vision (ICCV), 2019.
  66. BABEL: Bodies, action and behavior with english labels. In Proceedings IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), pages 722–731, June 2021.
  67. Tracking people by predicting 3D appearance, location & pose. In Computer Vision and Pattern Recognition (CVPR), 2022.
  68. Generating 3d faces using convolutional mesh autoencoders. In Proceedings of the European conference on computer vision (ECCV), pages 704–720, 2018.
  69. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018.
  70. Civilian American and European Surface Anthropometry Resource (CAESAR) final report. Technical Report AFRL-HE-WP-TR-2002-0169, US Air Force Research Laboratory, 2002.
  71. MoCap-guided data augmentation for 3D pose estimation in the wild. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, page 3116–3124, Red Hook, NY, USA, 2016. Curran Associates Inc.
  72. Embodied hands: Modeling and capturing hands and bodies together. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 2017.
  73. Delving deep into hybrid annotations for 3d human recovery in the wild. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5340–5348, 2019.
  74. Frankmocap: A monocular 3d whole-body pose estimation system via regression and integration. In IEEE International Conference on Computer Vision Workshops, 2021.
  75. SNUG: Self-Supervised Neural Dynamic Garments. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  76. Synthetic training for accurate 3D human pose and shape estimation in the wild. In British Machine Vision Conference (BMVC), 2020.
  77. Hierarchical kinematic probability distributions for 3D human shape and pose estimation from images in the wild. In International Conference on Computer Vision (ICCV), pages 11219–11229, 2021.
  78. Probabilistic 3D human shape and pose estimation from multiple unconstrained images in the wild. In Computer Vision and Pattern Recognition (CVPR), pages 16094–16104, 2021.
  79. HumanEva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. International Journal of Computer Vision (IJCV), 87(1):4–27, 2010.
  80. Learning joint top-down and bottom-up processes for 3D visual inference. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 1743 – 1752, 02 2006.
  81. Deep high-resolution representation learning for human pose estimation. In Computer Vision and Pattern Recognition (CVPR), 2019.
  82. TRACE: 5D temporal regression of avatars with dynamic cameras in 3D environments. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2023.
  83. Putting people in their place: Monocular regression of 3D people in depth. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13243–13252, 2022.
  84. GRAB: A dataset of whole-body human grasping of objects. In European Conference on Computer Vision (ECCV), 2020.
  85. SIZER: A dataset and model for parsing 3D clothing and learning size sensitive 3D clothing. In European Conference on Computer Vision (ECCV). Springer, August 2020.
  86. xR-EgoPose: Egocentric 3D human pose from an HMD camera. In Proceedings of the IEEE International Conference on Computer Vision, pages 7728–7738, 2019.
  87. Total capture: 3D human pose estimation fusing video and inertial sensors. In British Machine Vision Conference (BMVC), 2017.
  88. Learning from synthetic humans. In Computer Vision and Pattern Recognition (CVPR), pages 4627–4635, 2017.
  89. Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In European Conference on Computer Vision (ECCV), volume 11214, pages 614–631, 2018.
  90. Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In European Conference on Computer Vision (ECCV), 2018.
  91. Learning a shared shape space for multimodal garment design. ACM Trans. Graph., 37(6), dec 2018.
  92. 3D face reconstruction with dense landmarks. In European Conf. on Computer Vision (ECCV), 2022.
  93. Simple baselines for human pose estimation and tracking. In European Conference on Computer Vision (ECCV), 2018.
  94. GHUM & GHUML: Generative 3D human shape and articulated pose models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6184–6193, 2020.
  95. Mo22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTCap22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT : Real-time mobile 3D motion capture with a cap-mounted fisheye camera. IEEE Transactions on Visualization and Computer Graphics, 25(5):2093–2101, 2019.
  96. DenseRaC: Joint 3D pose and shape estimation by dense render-and-compare. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7760–7770, 2019.
  97. Ultrapose: Synthesizing dense pose with 1 billion points by human-body decoupling 3d model. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10891–10900, 2021.
  98. Decoupling human and camera motion from videos in the wild. In Computer Vision and Pattern Recognition (CVPR), 2023.
  99. MIME: Human-aware 3D scene generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023.
  100. HUMBI: A large multiview dataset of human body expressions. In Computer Vision and Pattern Recognition (CVPR), 2020.
  101. Glamr: Global occlusion-aware human mesh recovery with dynamic cameras. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  102. Human synthesis and scene compositing. Proceedings of the AAAI Conference on Artificial Intelligence, 34(07):12749–12756, Apr. 2020.
  103. THUNDR: Transformer-based 3D human reconstruction with markers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021.
  104. Detailed, accurate, human shape estimation from clothed 3D scan sequences. In Computer Vision and Pattern Recognition (CVPR), pages 5484–5493, 2017.
  105. Pymaf-x: Towards well-aligned full-body model regression from monocular images. arXiv preprint arXiv:2207.06400, 2022.
  106. PyMAF: 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In International Conference on Computer Vision (ICCV), pages 11446–11456, 2021.
  107. Object-occluded human shape and pose estimation from a single color image. In Computer Vision and Pattern Recognition (CVPR), 2020.
  108. On the continuity of rotation representations in neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5745–5753, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Michael J. Black (163 papers)
  2. Priyanka Patel (11 papers)
  3. Joachim Tesch (6 papers)
  4. Jinlong Yang (119 papers)
Citations (93)

Summary

We haven't generated a summary for this paper yet.