Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression (2104.02300v1)

Published 6 Apr 2021 in cs.CV

Abstract: In this paper, we are interested in the bottom-up paradigm of estimating human poses from an image. We study the dense keypoint regression framework that is previously inferior to the keypoint detection and grouping framework. Our motivation is that regressing keypoint positions accurately needs to learn representations that focus on the keypoint regions. We present a simple yet effective approach, named disentangled keypoint regression (DEKR). We adopt adaptive convolutions through pixel-wise spatial transformer to activate the pixels in the keypoint regions and accordingly learn representations from them. We use a multi-branch structure for separate regression: each branch learns a representation with dedicated adaptive convolutions and regresses one keypoint. The resulting disentangled representations are able to attend to the keypoint regions, respectively, and thus the keypoint regression is spatially more accurate. We empirically show that the proposed direct regression method outperforms keypoint detection and grouping methods and achieves superior bottom-up pose estimation results on two benchmark datasets, COCO and CrowdPose. The code and models are available at https://github.com/HRNet/DEKR.

Citations (217)

View on Semantic Scholar

Summary

The paper introduces DEKR, a novel method that disentangles keypoint regression to enhance spatial localization in human pose estimation.
It employs adaptive convolutions with separate regression branches, significantly reducing jitter and miss errors on benchmarks like COCO.
The approach achieves superior precision with lower computational overhead, benefiting applications in augmented reality, robotics, and motion capture.

Disentangled Keypoint Regression for Bottom-Up Human Pose Estimation

The paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" introduces an approach to improve the bottom-up paradigm in human pose estimation. The focus lies on dense keypoint regression, which seeks to directly regress keypoint positions. The methodology proposed, termed Disentangled Keypoint Regression (DEKR), aims to enhance the accuracy of spatial localization by ensuring that keypoints are accurately identified.

Methodology Overview

The principal innovation presented in this paper involves the utilization of disentangled keypoint regression. This is achieved through adaptive convolutions that enhance pixel activations in keypoint regions, culminating in a more precise interpretation of keypoint positions. Each branch within the model separately regresses the position of one keypoint, thereby ensuring that representations learned are specifically attuned to one keypoint region. This multi-branch architecture allows for more refined spatial accuracy, which is often lacking in conventional pixel-wise regression methods.

Empirical Findings

DEKR demonstrates superior performance over traditional keypoint detection and grouping techniques, as validated by experimental results on benchmark datasets such as COCO and CrowdPose. Notably, on the COCO dataset, DEKR achieves an average precision (AP) score of 71.0 with HRNet-W48, outperforming other state-of-the-art methods in the same category. Such performance boost is attributed to the disentangled approach that reduces keypoint localization errors significantly, particularly for jitter and miss errors.

Comparative Analysis

The paper details a thorough comparison against various state-of-the-art methods like CenterNet and HigherHRNet. In these comparisons, the proposed methodology consistently exhibits enhanced performance despite operating with reduced computational overhead. DEKR's efficacy is evident in both single-scale and multi-scale testing scenarios.

Implications and Future Directions

The implications of this research are notable for applications that require precision in human pose estimation, such as augmented reality, motion capture, and robotics. The disentangled approach not only improves accuracy but does so with a design that optimally balances complexity and performance.

Future developments may explore further refinement of multi-scale regression strategies and optimization techniques to enhance the capabilities of this approach. Additionally, adapting DEKR to other domains and tasks in computer vision where spatial localization is critical could expand its utility.

Conclusion

This paper contributes significantly to the enhancement of bottom-up human pose estimation through a novel approach that centers on disentangled keypoint regression. The method's capability to accurately localize keypoints using dedicated branches for each keypoint represents a major step forward in making dense keypoint regression a competitive alternative to more conventional methodologies in human pose estimation.

PDF Markdown

Related Papers

GitHub

GitHub - HRNet/DEKR: This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300) (454 stars)