Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Multi-person 3D pose estimation from unlabelled data (2212.08731v3)

Published 16 Dec 2022 in cs.CV and cs.AI

Abstract: Its numerous applications make multi-human 3D pose estimation a remarkably impactful area of research. Nevertheless, assuming a multiple-view system composed of several regular RGB cameras, 3D multi-pose estimation presents several challenges. First of all, each person must be uniquely identified in the different views to separate the 2D information provided by the cameras. Secondly, the 3D pose estimation process from the multi-view 2D information of each person must be robust against noise and potential occlusions in the scenario. In this work, we address these two challenges with the help of deep learning. Specifically, we present a model based on Graph Neural Networks capable of predicting the cross-view correspondence of the people in the scenario along with a Multilayer Perceptron that takes the 2D points to yield the 3D poses of each person. These two models are trained in a self-supervised manner, thus avoiding the need for large datasets with 3D annotations.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. X. Wang, “Intelligent multi-camera video surveillance: A review,” Pattern Recognit. Lett., vol. 34, pp. 3–19, 2013.
  2. F. Cardinaux, D. Bhowmik, C. Abhayaratne, and M. Hawley, “Video based technology for ambient assisted living: A review of the literature,” JAISE, vol. 3, pp. 253–269, 05 2011.
  3. D. Gerónimo, A. M. López, A. D. Sappa, and T. Graf, “Survey of pedestrian detection for advanced driver assistance systems,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 7, pp. 1239–1258, 2010.
  4. N. Shafiee, T. Padir, and E. Elhamifar, “Introvert: Human trajectory prediction via conditional 3d attention,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 16 815–16 825.
  5. L. Sun, Z. Yan, S. M. Mellado, M. Hanheide, and T. Duckett, “3dof pedestrian trajectory prediction learned from long-term autonomous mobile robot deployment data,” in 2018 IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2018, pp. 5942–5948.
  6. J. K. Aggarwal and L. Xia, “Human activity recognition from 3d data: A review,” Pattern Recognition Letters, vol. 48, pp. 70–80, 2014.
  7. Z. Yan, T. Duckett, and N. Bellotto, “Online learning for human classification in 3d lidar-based tracking,” in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017, pp. 864–871.
  8. T. Taipalus and J. Ahtiainen, “Human detection and tracking with knee-high mobile 2d lidar,” in 2011 IEEE International Conference on Robotics and Biomimetics, 2011, pp. 1672–1677.
  9. H. Tu, C. Wang, and W. Zeng, “Voxelpose: Towards multi-camera 3d human pose estimation in wild environment,” in European Conference on Computer Vision.   Springer, 2020, pp. 197–212.
  10. M. Camplani, A. Paiement, M. Mirmehdi, D. Damen, S. Hannuna, T. Burghardt, and L. Tao, “Multiple human tracking in rgb-depth data: a survey,” IET Computer Vision, vol. 11, no. 4, pp. 265–285, 2017.
  11. J. Zhang, W. Li, P. O. Ogunbona, P. Wang, and C. Tang, “RGB-D-based action recognition datasets: A survey,” Pattern Recognition, vol. 60, pp. 86–105, 2016.
  12. W. Abdulla, “Mask R-CNN for object detection and instance segmentation on Kerasand TensorFlow,” https://github.com/matterport/Mask˙RCNN, 2017, gitHub repository.
  13. Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei, and Y. A. Sheikh, “Openpose: Realtime multi-person 2d pose estimation using part affinity fields,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
  14. L. Bridgeman, M. Volino, J.-Y. Guillemaut, and A. Hilton, “Multi-person 3d pose estimation and tracking in sports,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019, pp. 2487–2496.
  15. J. Dong, Q. Fang, W. Jiang, Y. Yang, Q. Huang, H. Bao, and X. Zhou, “Fast and robust multi-person 3d pose estimation and tracking from multiple views,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
  16. H. Chen, R. Feng, S. Wu, H. Xu, F. Zhou, and Z. Liu, “2d human pose estimation: a survey,” Multimedia Systems, pp. 1–24, 2022.
  17. S. Kreiss, L. Bertoni, and A. Alahi, “Pifpaf: Composite fields for human pose estimation,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).   Los Alamitos, CA, USA: IEEE Computer Society, jun 2019, pp. 11 969–11 978.
  18. A. Jain, J. Tompson, M. Andriluka, G. W. Taylor, and C. Bregler, “Learning human pose estimation features with convolutional networks,” 2nd International Conference on Learning Representations, ICLR 2014 - Conference Track Proceedings, pp. 1–11, 2014.
  19. J. J. Tompson, A. Jain, Y. LeCun, and C. Bregler, “Joint training of a convolutional network and a graphical model for human pose estimation,” in Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Weinberger, Eds., vol. 27.   Curran Associates, Inc., 2014.
  20. S. Kreiss, L. Bertoni, and A. Alahi, “Openpifpaf: Composite fields for semantic keypoint detection and spatio-temporal association,” IEEE Transactions on Intelligent Transportation Systems, 2021.
  21. K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5693–5703.
  22. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds.   Cham: Springer International Publishing, 2014, pp. 740–755.
  23. J. Wang, S. Tan, X. Zhen, S. Xu, F. Zheng, Z. He, and L. Shao, “Deep 3D human pose estimation: A review,” Computer Vision and Image Understanding, vol. 210, no. August 2020, p. 103225, 2021.
  24. S. Li and A. B. Chan, “3d human pose estimation from monocular images with deep convolutional neural network,” in Asian Conference on Computer Vision.   Springer, 2014, pp. 332–347.
  25. G. Rogez, P. Weinzaepfel, and C. Schmid, “Lcr-net++: Multi-person 2d and 3d pose detection in natural images,” IEEE transactions on pattern analysis and machine intelligence, vol. 42, no. 5, pp. 1146–1161, 2019.
  26. D. Mehta, O. Sotnychenko, F. Mueller, W. Xu, M. Elgharib, P. Fua, H.-P. Seidel, H. Rhodin, G. Pons-Moll, and C. Theobalt, “Xnect: Real-time multi-person 3d motion capture with a single rgb camera,” ACM Trans. Graph., vol. 39, no. 4, aug 2020. [Online]. Available: https://doi.org/10.1145/3386569.3392410
  27. G. Pavlakos, X. Zhou, K. G. Derpanis, and K. Daniilidis, “Coarse-to-fine volumetric prediction for single-image 3d human pose,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).   Los Alamitos, CA, USA: IEEE Computer Society, jul 2017, pp. 1263–1272.
  28. F. Moreno-Noguer, “3d human pose estimation from a single image via distance matrix regression,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2823–2832.
  29. V. Belagiannis, S. Amin, M. Andriluka, B. Schiele, N. Navab, and S. Ilic, “3D Pictorial Structures Revisited: Multiple Human Pose Estimation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 10, pp. 1929–1942, 2016.
  30. S. Amin, M. Andriluka, M. Rohrbach, and B. Schiele, “Multi-view pictorial structures for 3d human pose estimation.” in Bmvc, vol. 1, no. 2, 2013.
  31. M. Kocabas, S. Karagoz, and E. Akbas, “Self-supervised learning of 3d human pose using multi-view geometry,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 1077–1086.
  32. V. Belagiannis, S. Amin, M. Andriluka, B. Schiele, N. Navab, and S. Ilic, “3d pictorial structures for multiple human pose estimation,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 1669–1676.
  33. H. Ye, W. Zhu, C. Wang, R. Wu, and Y. Wang, “Faster voxelpose: Real-time 3d human pose estimation by orthographic projection,” in Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VI.   Springer, 2022, pp. 142–159.
  34. D. Rodriguez-Criado, P. Bachiller, P. Bustos, G. Vogiatzis, and L. J. Manso, “Multi-camera torso pose estimation using graph neural networks,” in 2020 29th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN).   IEEE, 2020, pp. 827–832.
  35. J. Lin and G. H. Lee, “Multi-view multi-person 3d pose estimation with plane sweep stereo,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11 886–11 895.
  36. C. Ionescu, D. Papava, V. Olaru, and C. Sminchisescu, “Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 36, no. 7, pp. 1325–1339, 7 2014.
  37. H. Joo, H. Liu, L. Tan, L. Gui, B. C. Nabbe, I. A. Matthews, T. Kanade, S. Nobuhara, and Y. Sheikh, “Panoptic studio: A massively multiview system for social motion capture,” in 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015.   IEEE Computer Society, 2015, pp. 3334–3342.
  38. C. Xu, S. Chen, M. Li, and Y. Zhang, “Invariant teacher and equivariant student for unsupervised 3d human pose estimation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 4, 2021, pp. 3013–3021.
  39. W. Hu, C. Zhang, F. Zhan, L. Zhang, and T.-T. Wong, “Conditional directed graph convolution for 3d human pose estimation,” in Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 602–611.
  40. S. Wu, S. Jin, W. Liu, L. Bai, C. Qian, D. Liu, and W. Ouyang, “Graph-based 3d multi-person pose estimation using multi-view images,” in Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 11 148–11 157.
  41. S. Biswas, S. Sinha, K. Gupta, and B. Bhowmick, “Lifting 2d human pose to 3d: A weakly supervised approach,” in 2019 International Joint Conference on Neural Networks (IJCNN).   IEEE, 2019, pp. 1–9.
  42. J. N. Kundu, S. Seth, V. Jampani, M. Rakesh, R. V. Babu, and A. Chakraborty, “Self-supervised 3d human pose estimation via part guided novel image synthesis,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 6152–6162.
  43. V. Srivastav, A. Gangi, and N. Padoy, “Self-supervision on unlabelled or data for multi-person 2d/3d human pose estimation,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part I 23.   Springer, 2020, pp. 761–771.
  44. A. Bouazizi, J. Wiederer, U. Kressel, and V. Belagiannis, “Self-supervised 3d human pose estimation with multiple-view geometry,” in 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), 2021, pp. 1–8.
  45. K. Bartol, D. Bojanić, T. Petković, and T. Pribanić, “Generalizable human pose triangulation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 11 028–11 037.
  46. P. Bala, J. Zimmermann, H. Park, and B. Hayden, “Self-supervised secondary landmark detection via 3d representation learning,” International Journal of Computer Vision, vol. 131, no. 8, pp. 1980–1994, Aug. 2023.
  47. X. Gong, L. Song, M. Zheng, B. Planche, T. Chen, J. Yuan, D. S. Doermann, and Z. Wu, “Progressive multi-view human mesh recovery with self-supervision,” in Thirty-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7-14, 2023, B. W. 0001, Y. C. 0001, and J. Neville, Eds.   AAAI Press, 2023, pp. 676–684. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/25144
  48. D. Drover, R. M. V, C.-H. Chen, A. Agrawal, A. Tyagi, and C. P. Huynh, “Can 3d pose be learned from 2d projections alone?” in Computer Vision – ECCV 2018 Workshops, L. Leal-Taixé and S. Roth, Eds.   Cham: Springer International Publishing, 2019, pp. 78–94.
  49. H. Rhodin, J. Spörri, I. Katircioglu, V. Constantin, F. Meyer, E. Müller, M. Salzmann, and P. Fua, “Learning monocular 3d human pose estimation from multi-view images,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8437–8446.
  50. P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, and Y. Bengio, “Graph attention networks,” in International Conference on Learning Representations, 2018.
  51. L. Hubert and P. Arabie, “Comparing partitions,” Journal of classification, vol. 2, no. 1, pp. 193–218, 1985.
  52. A. Rosenberg and J. Hirschberg, “V-measure: A conditional entropy-based external cluster evaluation measure,” in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL).   Prague, Czech Republic: Association for Computational Linguistics, June 2007, pp. 410–420.
  53. J. Welsh, “trt_pose,” https://github.com/NVIDIA-AI-IOT/trt˙pose, 2012, accessed: 2022-06-09.
Citations (2)

Summary

We haven't generated a summary for this paper yet.