Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
43 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NToP: NeRF-Powered Large-scale Dataset Generation for 2D and 3D Human Pose Estimation in Top-View Fisheye Images (2402.18196v2)

Published 28 Feb 2024 in cs.CV and cs.GR

Abstract: Human pose estimation (HPE) in the top-view using fisheye cameras presents a promising and innovative application domain. However, the availability of datasets capturing this viewpoint is extremely limited, especially those with high-quality 2D and 3D keypoint annotations. Addressing this gap, we leverage the capabilities of Neural Radiance Fields (NeRF) technique to establish a comprehensive pipeline for generating human pose datasets from existing 2D and 3D datasets, specifically tailored for the top-view fisheye perspective. Through this pipeline, we create a novel dataset NToP570K (NeRF-powered Top-view human Pose dataset for fisheye cameras with over 570 thousand images), and conduct an extensive evaluation of its efficacy in enhancing neural networks for 2D and 3D top-view human pose estimation. A pretrained ViTPose-B model achieves an improvement in AP of 33.3 % on our validation set for 2D HPE after finetuning on our training set. A similarly finetuned HybrIK-Transformer model gains 53.7 mm reduction in PA-MPJPE for 3D HPE on the validation set.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (73)
  1. imghum: Implicit generative models of 3d human shape and articulated pose. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5461–5470, 2021.
  2. 2d human pose estimation: New benchmark and state of the art analysis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
  3. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 7291–7299, 2017.
  4. Gm-nerf: Learning generalizable model-based neural radiance fields from multi-view images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 20648–20658, 2023a.
  5. Fast-snarf: A fast deformer for articulated neural fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–15, 2023b.
  6. Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation. In CVPR, 2020.
  7. Generalizable neural performer: Learning robust radiance fields for human novel view synthesis, 2022.
  8. Dna-rendering: A diverse neural actor repository for high-fidelity human-centric rendering, 2023.
  9. Human behavior analysis: a survey on action recognition. Applied Sciences, 11(18):8324, 2021.
  10. A fall detection algorithm for indoor video sequences captured by fish-eye camera. In 2015 IEEE 15th International Conference on Bioinformatics and Bioengineering (BIBE), pages 1–5, 2015.
  11. Verification and regularization method for 3d-human body pose estimation based on prior knowledge. Electronic Imaging, 33:1–8, 2021.
  12. A review of state-of-the-art techniques for abnormal human activity recognition. Engineering Applications of Artificial Intelligence, 77:21–45, 2019.
  13. Rapid: rotation-aware people detection in overhead fisheye images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 636–637, 2020.
  14. Deca: Deep viewpoint-equivariant human pose estimation using capsule autoencoders. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 11677–11686, 2021a.
  15. Panoptop: A framework for generating viewpoint-invariant human pose estimation datasets. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, pages 234–242, 2021b.
  16. Learning neural volumetric representations of dynamic humans in minutes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 8759–8770, 2023.
  17. Vid2avatar: 3d avatar reconstruction from videos in the wild via self-supervised scene decomposition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12858–12868, 2023.
  18. A survey on vision transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(1):87–110, 2023.
  19. Towards viewpoint invariant 3d human pose estimation. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 160–177. Springer, 2016.
  20. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
  21. Large area 3d human pose detection via stereo reconstruction in panoramic cameras. arXiv preprint arXiv:1907.00534, 2019.
  22. Sherf: Generalizable human nerf from a single image. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 9352–9364, 2023.
  23. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7):1325–1339, 2014a.
  24. Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(7):1325–1339, 2014b.
  25. Learnable triangulation of human pose. In International Conference on Computer Vision (ICCV), 2019.
  26. Instantavatar: Learning avatars from monocular video in 60 seconds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16922–16932, 2023.
  27. Neuman: Neural human radiance field from a single video. In Proceedings of the European conference on computer vision (ECCV), 2022.
  28. Panoptic studio: A massively multiview system for social motion capture. In The IEEE International Conference on Computer Vision (ICCV), 2015.
  29. Segment anything. arXiv:2304.02643, 2023.
  30. Real-time fall detection using uncalibrated fisheye cameras. IEEE Transactions on Cognitive and Developmental Systems, 12(3):588–600, 2019.
  31. Hybrik: A hybrid analytical-neural inverse kinematics solution for 3d human pose and shape estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3383–3393, 2021a.
  32. Tava: Template-free animatable volumetric actors. In Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXII, page 419–436, Berlin, Heidelberg, 2022. Springer-Verlag.
  33. Tokenpose: Learning keypoint tokens for human pose estimation. In IEEE/CVF International Conference on Computer Vision (ICCV), 2021b.
  34. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740–755. Springer, 2014.
  35. Recent advances of monocular 2d and 3d human pose estimation: A deep learning perspective. ACM Comput. Surv., 55(4), 2022.
  36. SMPL: A skinned multi-person linear model. ACM Trans. Graphics (Proc. SIGGRAPH Asia), 34(6):248:1–248:16, 2015.
  37. MoSh: Motion and shape capture from sparse markers. ACM Transactions on Graphics, (Proc. SIGGRAPH Asia), 33(6):220:1–220:13, 2014.
  38. AMASS: Archive of motion capture as surface shapes. In International Conference on Computer Vision, pages 5442–5451, 2019.
  39. A simple yet effective baseline for 3d human pose estimation. In ICCV, 2017.
  40. Monocular 3d human pose estimation in the wild using improved cnn supervision. In 3D Vision (3DV), 2017 Fifth International Conference on. IEEE, 2017.
  41. Nerf: Representing scenes as neural radiance fields for view synthesis. In ECCV, 2020.
  42. Actorsnerf: Animatable few-shot human rendering with generalizable nerfs. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 18391–18401, 2023.
  43. Stacked hourglass networks for human pose estimation. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VIII 14, pages 483–499. Springer, 2016.
  44. Associative embedding: End-to-end learning for joint detection and grouping. Advances in neural information processing systems, 30, 2017.
  45. Incorporation of panoramic view in fall detection using omnidirectional camera. In The International Conference on Intelligent Systems & Networks, pages 313–318. Springer, 2021.
  46. Neural articulated radiance field. In International Conference on Computer Vision, 2021.
  47. Boris N. Oreshkin. 3d human pose and shape estimation via hybrik-transformer, 2023.
  48. Expressive body capture: 3d hands, face, and body from a single image. In Proceedings IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2019.
  49. Neural body: Implicit neural representations with structured latent codes for novel view synthesis of dynamic humans. In CVPR, 2021.
  50. Deepcut: Joint subset partition and labeling for multi person pose estimation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 4929–4937, 2016.
  51. Omniflow: Human omnidirectional optical flow. In The Second OmniCV Workshop: Omnidirectional Computer Vision in Research and Industry, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021.
  52. Pliks: A pseudo-linear inverse kinematic solver for 3d human body estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 574–584, 2023.
  53. Unsupervised volumetric animation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4658–4669, 2023.
  54. Contactless interactive fall detection and sleep quality estimation for supporting elderly with incipient dementia. Current Directions in Biomedical Engineering, 6(3):388–391, 2020.
  55. A-nerf: Articulated neural radiance fields for learning human shape, appearance, and pose. In Advances in Neural Information Processing Systems, 2021.
  56. Deep high-resolution representation learning for human pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
  57. Wepdtof: A dataset and benchmark algorithms for in-the-wild people detection and tracking from overhead fisheye cameras. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 503–512, 2022.
  58. Recovering 3d human mesh from monocular images: A survey, 2022.
  59. Recovering accurate 3d human pose in the wild using imus and a moving camera. In European Conference on Computer Vision (ECCV), 2018.
  60. Clothed human performance capture with a double-layer neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21098–21107, 2023.
  61. HumanNeRF: Free-viewpoint rendering of moving people from monocular video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16210–16220, 2022.
  62. Simple baselines for human pose estimation and tracking. In Proceedings of the European conference on computer vision (ECCV), pages 466–481, 2018.
  63. Transformer for skeleton-based action recognition: A review of recent advances. Neurocomputing, 2023.
  64. Ghum & ghuml: Generative 3d human shape and articulated pose models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6184–6193, 2020.
  65. H-nerf: Neural radiance fields for rendering and temporal reconstruction of humans in motion. In Advances in Neural Information Processing Systems, pages 14955–14966. Curran Associates, Inc., 2021.
  66. Vitpose: Simple vision transformer baselines for human pose estimation. In Advances in Neural Information Processing Systems, 2022.
  67. Transpose: Keypoint localization via transformer. In IEEE/CVF International Conference on Computer Vision (ICCV), 2021.
  68. Omnipd: One-step person detection in top-view omnidirectional indoor scenes. Current Directions in Biomedical Engineering, 5(1):239–244, 2019.
  69. Applications of deep learning for top-view omnidirectional imaging: A survey, 2023a.
  70. Human pose estimation in monocular omnidirectional top-view images, 2023b.
  71. Hrformer: High-resolution transformer for dense prediction. 2021.
  72. Avatarrex: Real-time expressive full-body avatars. ACM Transactions on Graphics (TOG), 42(4), 2023.
  73. Motionbert: A unified perspective on learning human motion representations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023.

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets