HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation (1908.10357v3)

Published 27 Aug 2019 in cs.CV, cs.LG, and eess.IV

Abstract: Bottom-up human pose estimation methods have difficulties in predicting the correct pose for small persons due to challenges in scale variation. In this paper, we present HigherHRNet: a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids. Equipped with multi-resolution supervision for training and multi-resolution aggregation for inference, the proposed approach is able to solve the scale variation challenge in bottom-up multi-person pose estimation and localize keypoints more precisely, especially for small person. The feature pyramid in HigherHRNet consists of feature map outputs from HRNet and upsampled higher-resolution outputs through a transposed convolution. HigherHRNet outperforms the previous best bottom-up method by 2.5% AP for medium person on COCO test-dev, showing its effectiveness in handling scale variation. Furthermore, HigherHRNet achieves new state-of-the-art result on COCO test-dev (70.5% AP) without using refinement or other post-processing techniques, surpassing all existing bottom-up methods. HigherHRNet even surpasses all top-down methods on CrowdPose test (67.6% AP), suggesting its robustness in crowded scene. The code and models are available at https://github.com/HRNet/Higher-HRNet-Human-Pose-Estimation.

Citations (610)

View on Semantic Scholar

Summary

The paper introduces a scale-aware high-resolution network that refines keypoint localization using deconvolution-based heatmaps.
It employs multi-resolution supervision to maintain consistent keypoint precision across variable image scales without modifying the Gaussian kernel.
The method achieves state-of-the-art AP scores of 70.5% on COCO and 67.6% on CrowdPose, proving its effectiveness in crowded scenes.

HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation

The paper "HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation" presents an innovative approach to overcoming challenges in bottom-up human pose estimation, particularly addressing scale variation. The authors introduce HigherHRNet, a method that employs high-resolution feature pyramids to accurately localize keypoints, especially for smaller individuals.

Key Contributions

Scale-Aware High-Resolution Network: HigherHRNet integrates HRNet with deconvolution modules to generate high-resolution heatmaps. This approach enhances the precision of keypoint localization, particularly beneficial for small-scale human figures in images.
Multi-Resolution Supervision: Training involves multi-resolution supervision to ensure the model can handle different scales effectively. By not varying the Gaussian kernel's standard deviation across scales, the method maintains consistency in keypoint precision.
Heatmap Aggregation Strategy: During inference, HigherHRNet utilizes a heatmap aggregation strategy, which combines heatmaps from multiple resolutions. This ensures scale-aware pose estimation and enhances accuracy across various image scales.

Numerical Results

The empirical results on the COCO dataset demonstrate the effectiveness of HigherHRNet. The model achieves a state-of-the-art AP of 70.5% on COCO test-dev without post-processing techniques. When tested on the CrowdPose dataset, HigherHRNet achieves 67.6% AP, surpassing existing bottom-up methods and even some top-down approaches, demonstrating robustness in crowded scenes.

Implications and Speculations

The methodological advancements presented in HigherHRNet provide a significant contribution to the field of computer vision and pose estimation. By effectively handling scale variations, this approach could enhance the deployment of pose estimation systems in real-time applications, such as surveillance and interactive systems, where computational efficiency and accuracy are paramount.

From a theoretical perspective, HigherHRNet demonstrates the potential of feature pyramids in improving model robustness against scale variations, suggesting further exploration in other domains like object detection or scene understanding.

Future Directions

The promising results of HigherHRNet encourage further exploration into even higher-resolution features and adaptive pyramid designs to better handle diverse datasets with varying scales. Additionally, extending the application of such networks to three-dimensional human pose estimation could open new avenues for research and application.

In conclusion, HigherHRNet provides an effective approach to bottom-up human pose estimation, addressing scale variance challenges while setting new benchmarks for accuracy and efficiency. As the demand for real-time, accurate systems grows, the techniques developed in this paper are poised to play a crucial role in the evolution of AI-driven human-computer interaction.

PDF Markdown

Related Papers

GitHub

GitHub - HRNet/HigherHRNet-Human-Pose-Estimation: This is an official implementation of our CVPR 2020 paper "HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation" (https://arxiv.org/abs/1908.10357) (1,300 stars)

YouTube

Show All Videos