Papers
Topics
Authors
Recent
AI Research Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 71 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 91 tok/s Pro
Kimi K2 164 tok/s Pro
GPT OSS 120B 449 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

EA-RAS: Towards Efficient and Accurate End-to-End Reconstruction of Anatomical Skeleton (2409.01555v1)

Published 3 Sep 2024 in cs.CV and cs.AI

Abstract: Efficient, accurate and low-cost estimation of human skeletal information is crucial for a range of applications such as biology education and human-computer interaction. However, current simple skeleton models, which are typically based on 2D-3D joint points, fall short in terms of anatomical fidelity, restricting their utility in fields. On the other hand, more complex models while anatomically precise, are hindered by sophisticate multi-stage processing and the need for extra data like skin meshes, making them unsuitable for real-time applications. To this end, we propose the EA-RAS (Towards Efficient and Accurate End-to-End Reconstruction of Anatomical Skeleton), a single-stage, lightweight, and plug-and-play anatomical skeleton estimator that can provide real-time, accurate anatomically realistic skeletons with arbitrary pose using only a single RGB image input. Additionally, EA-RAS estimates the conventional human-mesh model explicitly, which not only enhances the functionality but also leverages the outside skin information by integrating features into the inside skeleton modeling process. In this work, we also develop a progressive training strategy and integrated it with an enhanced optimization process, enabling the network to obtain initial weights using only a small skin dataset and achieve self-supervision in skeleton reconstruction. Besides, we also provide an optional lightweight post-processing optimization strategy to further improve accuracy for scenarios that prioritize precision over real-time processing. The experiments demonstrated that our regression method is over 800 times faster than existing methods, meeting real-time requirements. Additionally, the post-processing optimization strategy provided can enhance reconstruction accuracy by over 50% and achieve a speed increase of more than 7 times.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (85)
  1. A. Balmik, M. Jha, and A. Nandy, “Nao robot teleoperation with human motion recognition,” Arabian Journal for Science and Engineering, vol. 47, no. 2, pp. 1137–1146, 2022.
  2. S. Liu, W. Wu, J. Wu, and Y. Lin, “Spatial-temporal parallel transformer for arm-hand dynamic estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 20 523–20 532.
  3. H. Rao, S. Xu, X. Hu, J. Cheng, and B. Hu, “Multi-level graph encoding with structural-collaborative relation learning for skeleton-based person re-identification,” in Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, Z.-H. Zhou, Ed.   International Joint Conferences on Artificial Intelligence Organization, 8 2021, pp. 973–980, main Track. [Online]. Available: https://doi.org/10.24963/ijcai.2021/135
  4. Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2d pose estimation using part affinity fields,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 7291–7299.
  5. Y. Xu, J. Zhang, Q. Zhang, and D. Tao, “Vitpose: Simple vision transformer baselines for human pose estimation,” Advances in Neural Information Processing Systems, vol. 35, pp. 38 571–38 584, 2022.
  6. K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5693–5703.
  7. W. Hu, Q. Sheng, and X. Sheng, “A novel realtime vision-based acupoint estimation for tcm massage robot,” in 2021 27th International Conference on Mechatronics and Machine Vision in Practice (M2VIP).   IEEE, 2021, pp. 771–776.
  8. Q. Xu, Z. Deng, C. Zeng, Z. Li, B. He, and J. Zhang, “Toward automatic robotic massage based on interactive trajectory planning and control,” Complex & Intelligent Systems, vol. 10, no. 3, pp. 4397–4407, 2024.
  9. S. Stevens, J. Wu, M. J. Thompson, E. G. Campolongo, C. H. Song, D. E. Carlyn, L. Dong, W. M. Dahdul, C. Stewart, T. Berger-Wolf et al., “Bioclip: A vision foundation model for the tree of life,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 19 412–19 424.
  10. L. Siyao, W. Yu, T. Gu, C. Lin, Q. Wang, C. Qian, C. C. Loy, and Z. Liu, “Bailando: 3d dance generation by actor-critic gpt with choreographic memory,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 11 050–11 059.
  11. S. Saito, Z.-Y. Zhou, and L. Kavan, “Computational bodybuilding: Anatomically-based modeling of human bodies,” ACM Transactions on Graphics (TOG), vol. 34, no. 4, pp. 1–12, 2015.
  12. D. Ali-Hamadi, T. Liu, B. Gilles, L. Kavan, F. Faure, O. Palombi, and M.-P. Cani, “Anatomy transfer,” ACM transactions on graphics (TOG), vol. 32, no. 6, pp. 1–8, 2013.
  13. M. Keller, S. Zuffi, M. J. Black, and S. Pujades, “OSSO: Obtaining skeletal shape from outside,” in Proceedings IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Jun. 2022, pp. 20 492–20 501.
  14. P. Kadleček, A.-E. Ichim, T. Liu, J. Křivánek, and L. Kavan, “Reconstructing personalized anatomical models for physics-based body animation,” ACM Transactions on Graphics (TOG), vol. 35, no. 6, pp. 1–13, 2016.
  15. T. Von Marcard, R. Henschel, M. J. Black, B. Rosenhahn, and G. Pons-Moll, “Recovering accurate 3d human pose in the wild using imus and a moving camera,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 601–617.
  16. S. Johnson and M. Everingham, “Clustered pose and nonlinear appearance models for human pose estimation.” in bmvc.   Aberystwyth, UK, 2010, p. 5.
  17. C. Zimmermann and T. Brox, “Learning to estimate 3d hand pose from single rgb images,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 4903–4911.
  18. Y. Xu, S.-C. Zhu, and T. Tung, “Denserac: Joint 3d pose and shape estimation by dense render-and-compare,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 7760–7770.
  19. R. A. Güler, N. Neverova, and I. Kokkinos, “Densepose: Dense human pose estimation in the wild,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7297–7306.
  20. M. Loper, N. Mahmood, and M. J. Black, “Mosh: Motion and shape capture from sparse markers,” ACM Transactions on Graphics (ToG), vol. 33, no. 6, pp. 1–13, 2014.
  21. N. Mahmood, N. Ghorbani, N. F. Troje, G. Pons-Moll, and M. J. Black, “Amass: Archive of motion capture as surface shapes,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 5442–5451.
  22. C. Lassner, J. Romero, M. Kiefel, F. Bogo, M. J. Black, and P. V. Gehler, “Unite the people: Closing the loop between 3d and 2d human representations,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 6050–6059.
  23. N. Kolotouros, G. Pavlakos, M. J. Black, and K. Daniilidis, “Learning to reconstruct 3d human pose and shape via model-fitting in the loop,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 2252–2261.
  24. M. Andriluka, S. Roth, and B. Schiele, “Pictorial structures revisited: People detection and articulated pose estimation,” in 2009 IEEE conference on computer vision and pattern recognition.   IEEE, 2009, pp. 1014–1021.
  25. G. Gkioxari, P. Arbeláez, L. Bourdev, and J. Malik, “Articulated pose estimation using discriminative armlet classifiers,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 3342–3349.
  26. H. Yi, C.-H. P. Huang, D. Tzionas, M. Kocabas, M. Hassan, S. Tang, J. Thies, and M. J. Black, “Human-aware object placement for visual environment reconstruction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 3959–3970.
  27. Z. Zou and W. Tang, “Modulated graph convolutional network for 3d human pose estimation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 11 477–11 487.
  28. Z. Zhang, C. Wang, W. Qiu, W. Qin, and W. Zeng, “Adafuse: Adaptive multiview fusion for accurate human pose estimation in the wild,” International Journal of Computer Vision, vol. 129, pp. 703–718, 2021.
  29. G. Papandreou, T. Zhu, N. Kanazawa, A. Toshev, J. Tompson, C. Bregler, and K. Murphy, “Towards accurate multi-person pose estimation in the wild,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4903–4911.
  30. K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2961–2969.
  31. A. Newell, Z. Huang, and J. Deng, “Associative embedding: End-to-end learning for joint detection and grouping,” Advances in neural information processing systems, vol. 30, 2017.
  32. D. Wang, “Stacked dense-hourglass networks for human pose estimation,” Ph.D. dissertation, University of Illinois at Urbana-Champaign, 2018.
  33. J. Carreira, P. Agrawal, K. Fragkiadaki, and J. Malik, “Human pose estimation with iterative error feedback,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4733–4742.
  34. S. Yang, Z. Quan, M. Nie, and W. Yang, “Transpose: Keypoint localization via transformer,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 11 802–11 812.
  35. Y. Li, S. Zhang, Z. Wang, S. Yang, W. Yang, S.-T. Xia, and E. Zhou, “Tokenpose: Learning keypoint tokens for human pose estimation,” in Proceedings of the IEEE/CVF International conference on computer vision, 2021, pp. 11 313–11 322.
  36. P. J. Besl and N. D. McKay, “Method for registration of 3-d shapes,” in Sensor fusion IV: control paradigms and data structures, vol. 1611.   Spie, 1992, pp. 586–606.
  37. D. Mateus, R. Horaud, D. Knossow, F. Cuzzolin, and E. Boyer, “Articulated shape matching using laplacian eigenfunctions and unsupervised point registration,” in 2008 IEEE Conference on Computer Vision and Pattern Recognition.   IEEE, 2008, pp. 1–8.
  38. Y. Lipman and T. Funkhouser, “Möbius voting for surface correspondence,” ACM Transactions on Graphics (ToG), vol. 28, no. 3, pp. 1–12, 2009.
  39. A. Bauer, “Modélisation anatomique utilisateur-spécifique et animation temps-réel: Application à l’apprentissage de l’anatomie,” Ph.D. dissertation, Université Grenoble Alpes (ComUE), 2016.
  40. A. BALAN, “The naked truth: Estimating body shape under clothing,” ECCV, 2008, 2008.
  41. N. Hasler, H. Ackermann, B. Rosenhahn, T. Thormählen, and H.-P. Seidel, “Multilinear pose and body shape estimation of dressed subjects from image sets,” in CVPR, 2010.
  42. F. Bogo, A. Kanazawa, C. Lassner, P. Gehler, J. Romero, and M. J. Black, “Keep it smpl: Automatic estimation of 3d human pose and shape from a single image,” in European conference on computer vision.   Springer, 2016, pp. 561–578.
  43. D. Xiang, H. Joo, and Y. Sheikh, “Monocular total capture: Posing face, body, and hands in the wild,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 10 957–10 966.
  44. M. Hassan, V. Choutas, D. Tzionas, and M. Black, “Resolving 3d human pose ambiguities with 3d scene constraints,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 2282–2292.
  45. J. Y. Zhang, S. Pepose, H. Joo, D. Ramanan, J. Malik, and A. Kanazawa, “Perceiving 3d human-object spatial arrangements from a single image in the wild,” in Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XII.   Berlin, Heidelberg: Springer-Verlag, 2020, p. 34–51. [Online]. Available: https://doi.org/10.1007/978-3-030-58610-2_3
  46. M. Kocabas, C.-H. P. Huang, O. Hilliges, and M. J. Black, “Pare: Part attention regressor for 3d human body estimation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 11 127–11 137.
  47. H. Choi, G. Moon, and K. M. Lee, “Pose2mesh: Graph convolutional network for 3d human pose and mesh recovery from a 2d human pose,” in European Conference on Computer Vision.   Springer, 2020, pp. 769–787.
  48. R. Hanocka, G. Metzer, R. Giryes, and D. Cohen-Or, “Point2mesh: A self-prior for deformable meshes,” arXiv preprint arXiv:2005.11084, 2020.
  49. G. Varol, D. Ceylan, B. Russell, J. Yang, E. Yumer, I. Laptev, and C. Schmid, “Bodynet: Volumetric inference of 3d human body shapes,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 20–36.
  50. N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y.-G. Jiang, “Pixel2mesh: Generating 3d mesh models from single rgb images,” in Proceedings of the European conference on computer vision (ECCV), 2018, pp. 52–67.
  51. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  52. K. Lin, L. Wang, and Z. Liu, “End-to-end human pose and mesh reconstruction with transformers,” in CVPR, 2021.
  53. ——, “Mesh graphormer,” in ICCV, 2021.
  54. H. Zhang, Y. Tian, X. Zhou, W. Ouyang, Y. Liu, L. Wang, and Z. Sun, “Pymaf: 3d human pose and shape regression with pyramidal mesh alignment feedback loop,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 11 446–11 456.
  55. G. Pavlakos, L. Zhu, X. Zhou, and K. Daniilidis, “Learning to estimate 3d human pose and shape from a single color image,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 459–468.
  56. M. Omran, C. Lassner, G. Pons-Moll, P. Gehler, and B. Schiele, “Neural body fitting: Unifying deep learning and model based human pose and shape estimation,” in 2018 international conference on 3D vision (3DV).   IEEE, 2018, pp. 484–494.
  57. A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik, “End-to-end recovery of human shape and pose,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 7122–7131.
  58. M. Loper, N. Mahmood, J. Romero, G. Pons-Moll, and M. J. Black, “SMPL: A skinned multi-person linear model,” ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), vol. 34, no. 6, pp. 248:1–248:16, Oct. 2015.
  59. L. Sigal, A. Balan, and M. Black, “Combined discriminative and generative articulated pose and non-rigid shape estimation,” Advances in neural information processing systems, vol. 20, 2007.
  60. S. Zuffi and M. J. Black, “The stitched puppet: A graphical model of 3d human shape and pose,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 3537–3546.
  61. P. Beeson and B. Ames, “Trac-ik: An open-source library for improved solving of generic inverse kinematics,” in 2015 IEEE-RAS 15th International Conference on Humanoid Robots (Humanoids), 2015, pp. 928–935.
  62. D. Pavllo, C. Feichtenhofer, D. Grangier, and M. Auli, “3d human pose estimation in video with temporal convolutions and semi-supervised training,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 7753–7762.
  63. C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, “Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,” IEEE transactions on pattern analysis and machine intelligence, vol. 36, no. 7, pp. 1325–1339, 2013.
  64. Y. Sun, Q. Bao, W. Liu, Y. Fu, M. J. Black, and T. Mei, “Monocular, one-stage, regression of multiple 3d people,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 11 179–11 188.
  65. Y. Sun, W. Liu, Q. Bao, Y. Fu, T. Mei, and M. J. Black, “Putting people in their place: Monocular regression of 3d people in depth,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13 243–13 252.
  66. Y. Sun, Q. Bao, W. Liu, T. Mei, and M. J. Black, “Trace: 5d temporal regression of avatars with dynamic cameras in 3d environments,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 8856–8866.
  67. A. Kanazawa, J. Y. Zhang, P. Felsen, and J. Malik, “Learning 3d human dynamics from video,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5614–5623.
  68. M. Kocabas, N. Athanasiou, and M. J. Black, “Vibe: Video inference for human body pose and shape estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 5253–5263.
  69. H. Choi, G. Moon, J. Y. Chang, and K. M. Lee, “Beyond static features for temporally consistent 3d human pose and shape from a video,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 1964–1973.
  70. W.-L. Wei, J.-C. Lin, T.-L. Liu, and H.-Y. M. Liao, “Capturing humans in motion: Temporal-attentive 3d human pose and shape estimation from monocular video,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13 211–13 220.
  71. V. Choutas, L. Müller, C.-H. P. Huang, S. Tang, D. Tzionas, and M. J. Black, “Accurate 3d body shape regression using metric and semantic attributes,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2718–2728.
  72. Z. Wang and S. Ostadabbas, “Live stream temporally embedded 3d human body pose and shape estimation,” in arXiv preprint: https://arxiv.org/pdf/2207.12537.pdf, July 2022.
  73. H. Nam, D. S. Jung, Y. Oh, and K. M. Lee, “Cyclic test-time adaptation on monocular video for 3d human mesh reconstruction,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 14 829–14 839.
  74. G. Moon and K. M. Lee, “I2l-meshnet: Image-to-lixel prediction network for accurate 3d human pose and mesh estimation from a single rgb image,” in European Conference on Computer Vision.   Springer, 2020, pp. 752–768.
  75. A. Zeng, L. Yang, X. Ju, J. Li, J. Wang, and Q. Xu, “Smoothnet: A plug-and-play network for refining human poses in videos,” in European Conference on Computer Vision.   Springer, 2022, pp. 625–642.
  76. X. Shen, Z. Yang, X. Wang, J. Ma, C. Zhou, and Y. Yang, “Global-to-local modeling for video-based 3d human pose and shape estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 8887–8896.
  77. F. Yang, K. Gu, and A. Yao, “Kitro: Refining human mesh by 2d clues and kinematic-tree rotation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 1052–1061.
  78. L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of machine learning research, vol. 9, no. 11, 2008.
  79. S. Johnson and M. Everingham, “Learning effective human pose estimation from inaccurate annotation,” in CVPR 2011.   IEEE, 2011, pp. 1465–1472.
  80. L. Hu, Y. Wang, J. Zhang, J. Zhang, Y. Cui, L. Ma, J. Jiang, L. Fang, and B. Zhang, “A massage robot based on chinese massage therapy,” Industrial Robot: An International Journal, vol. 40, no. 2, pp. 158–172, 2013.
  81. C. Li, A. Fahmy, S. Li, and J. Sienz, “An enhanced robot massage system in smart homes using force sensing and a dynamic movement primitive,” Frontiers in Neurorobotics, vol. 14, p. 30, 2020.
  82. S. A. of Traditional Chinese Medicine, “Gb 12346-2006 acupoint name and location (national standard of the people’s republic of china),” China Standards Press, 2006.
  83. F. Li, T. He, Q. Xu, L.-T. Lin, H. Li, Y. Liu, G.-X. Shi, and C.-Z. Liu, “What is the acupoint? a preliminary review of acupoints,” Pain Medicine, vol. 16, no. 10, pp. 1905–1915, 2015.
  84. B. Cai, P. Sun, M. Li, E. Cheng, Z. Sun, and B. Song, “An acupoint detection approach for robotic upper limb acupuncture therapy,” in 2022 12th International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), 2022, pp. 989–992.
  85. L. Sun, S. Sun, Y. Fu, and X. Zhao, “Acupoint detection based on deep convolutional neural network,” in 2020 39th Chinese Control Conference (CCC), 2020, pp. 7418–7422.

Summary

We haven't generated a summary for this paper yet.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube