DHRNet: A Dual-Path Hierarchical Relation Network for Multi-Person Pose Estimation (2404.14025v2)
Abstract: Multi-person pose estimation (MPPE) presents a formidable yet crucial challenge in computer vision. Most existing methods predominantly concentrate on isolated interaction either between instances or joints, which is inadequate for scenarios demanding concurrent localization of both instances and joints. This paper introduces a novel CNN-based single-stage method, named Dual-path Hierarchical Relation Network (DHRNet), to extract instance-to-joint and joint-to-instance interactions concurrently. Specifically, we design a dual-path interaction modeling module (DIM) that strategically organizes cross-instance and cross-joint interaction modeling modules in two complementary orders, enriching interaction information by integrating merits from different correlation modeling branches. Notably, DHRNet excels in joint localization by leveraging information from other instances and joints. Extensive evaluations on challenging datasets, including COCO, CrowdPose, and OCHuman datasets, showcase DHRNet's state-of-the-art performance. The code will be released at https://github.com/YHDang/dhrnet-multi-pose-estimation.
- Multi-scale single-stage pose detection with adaptive sample training in the classroom scene, Knowledge-Based Systems 222 (2021) 107008.
- Human computer interaction with head pose, eye gaze and body gestures, in: Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition (FG), 2018, p. 789.
- Deep learning based 2d human pose estimation: A survey, Tsinghua Science and Technology 24 (2019) 663–676.
- Simple baselines for human pose estimation and tracking, in: V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss (Eds.), ECCV, 2018, pp. 472–487.
- Deep high-resolution representation learning for human pose estimation, in: CVPR, 2019, pp. 5693–5703.
- Peeking into occluded joints: A novel framework for crowd pose estimation, in: ECCV, volume 12364, 2020, pp. 488–504.
- Tfpose: Direct human pose estimation with transformers, CoRR (2021).
- Ppt: token-pruned pose transformer for monocular and multi-view human pose estimation, in: ECCV, 2022, pp. 424–442.
- Swin-pose: Swin transformer based human pose estimation, in: MIPR, 2022, pp. 228–233.
- Relation-based associative joint location for human pose estimation in videos, IEEE Trans. Image Process. 31 (2022) 3973–3986.
- Differentiable hierarchical graph grouping for multi-person pose estimation, in: A. Vedaldi, H. Bischof, T. Brox, J. Frahm (Eds.), ECCV, 2020, pp. 718–734.
- Higherhrnet: Scale-aware representation learning for bottom-up human pose estimation, in: CVPR, 2020, pp. 5385–5394.
- Rethinking the heatmap regression for bottom-up human pose estimation, in: CVPR, 2021, pp. 13264–13273.
- Pose recognition with cascade transformers, in: CVPR, 2021, pp. 1944–1953.
- End-to-end multi-person pose estimation with transformers, in: CVPR, 2022, pp. 11069–11078.
- Explicit box detection unifies end-to-end multi-person pose estimation, in: ICLR, 2023.
- Group pose: A simple baseline for end-to-end multi-person pose estimation, in: ICCV, 2023.
- I22{{}^{2}}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPTr-net: Intra- and inter-human relation network for multi-person pose estimation, in: IJCAI, 2022, pp. 855–862.
- Microsoft COCO: common objects in context, in: ECCV, 2014, pp. 740–755.
- Crowdpose: Efficient crowded scenes pose estimation and a new benchmark, in: CVPR, 2019, pp. 10863–10872.
- Pose2seg: Detection free human instance segmentation, in: CVPR, 2019, pp. 889–898.
- A coarse-fine network for keypoint localization, in: ICCV, 2017, pp. 3047–3056.
- Skeletonpose: Exploiting human skeleton constraint for 3d human pose estimation, Knowledge-Based Systems 255 (2022) 109691.
- H. Wang, M. Sun, Smart-vposenet: 3d human pose estimation models and methods based on multi-view discriminant network, Knowledge-Based Systems 239 (2022) 107992.
- RMPE: regional multi-person pose estimation, in: ICCV, 2017, pp. 2353–2362.
- Openpose: Realtime multi-person 2d pose estimation using part affinity fields, IEEE Trans. Pattern Anal. Mach. Intell. 43 (2021) 172–186.
- Pifpaf: Composite fields for human pose estimation, in: CVPR, 2019, pp. 11977–11986.
- Personlab: Person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model, in: ECCV, 2018, pp. 282–299.
- Bottom-up human pose estimation via disentangled keypoint regression, in: CVPR, 2021, pp. 14676–14686.
- Single-stage multi-person pose machines, in: ICCV, 2019, pp. 6950–6959.
- Point-set anchors for object detection, instance segmentation and pose estimation, in: ECCV, 2020, pp. 527–544.
- Fcpose: Fully convolutional multi-person pose estimation with dynamic instance-aware convolutions, in: CVPR, 2021, pp. 9034–9043.
- Inspose: Instance-aware networks for single-stage multi-person pose estimation, in: ACMMM, 2021, pp. 3079–3087.
- D. Wang, S. Zhang, Contextual instance decoupling for robust multi-person pose estimation, in: CVPR, 2022, pp. 11050–11058.
- Squeeze-and-excitation networks, in: CVPR, 2018, pp. 7132–7141.
- CBAM: convolutional block attention module, in: ECCV, 2018, pp. 3–19.
- D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, in: ICLR, 2015.
- Mask R-CNN, IEEE Trans. Pattern Anal. Mach. Intell. 42 (2020) 386–397.
- Associative embedding: End-to-end learning for joint detection and grouping, in: NeurIPS, 2017, pp. 2277–2287.
- The center of attention: Center-keypoint grouping via attention for multi-person pose estimation, in: ICCV, 2021, pp. 11833–11843.
- Decenternet: Bottom-up human pose estimation via decentralized pose representation, in: ACM MM, 2023, pp. 1798–1808.
- Objects as points, CoRR abs/1904.07850 (2019).
- Robust pose estimation in crowded scenes with direct pose-level inference, in: NeurIPS, 2021, pp. 6278–6289.
- Rethinking the person localization for single-stage multi-person pose estimation, IEEE Transactions on Multimedia 26 (2024) 1436–1447.