MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network (1807.04067v1)

Published 11 Jul 2018 in cs.CV

Abstract: In this paper, we present MultiPoseNet, a novel bottom-up multi-person pose estimation architecture that combines a multi-task model with a novel assignment method. MultiPoseNet can jointly handle person detection, keypoint detection, person segmentation and pose estimation problems. The novel assignment method is implemented by the Pose Residual Network (PRN) which receives keypoint and person detections, and produces accurate poses by assigning keypoints to person instances. On the COCO keypoints dataset, our pose estimation method outperforms all previous bottom-up methods both in accuracy (+4-point mAP over previous best result) and speed; it also performs on par with the best top-down methods while being at least 4x faster. Our method is the fastest real time system with 23 frames/sec. Source code is available at: https://github.com/mkocabas/pose-residual-network

Citations (243)

View on Semantic Scholar

Summary

The paper introduces a novel architecture that leverages a Pose Residual Network to efficiently group keypoints and enhance multi-person pose estimation accuracy.
The methodology integrates a shared ResNet-FPN backbone with parallel subnets for simultaneous keypoint detection and person segmentation.
Experimental results show a 4-point mAP improvement and real-time performance at approximately 23 FPS, underscoring its practical potential.

MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network

The paper introduces MultiPoseNet, an innovative architecture for multi-person pose estimation that leverages a bottom-up approach in combination with multi-task learning. This architecture is equipped to simultaneously detect keypoints, perform person segmentation, and estimate poses, all while maintaining speed and accuracy. Central to the proposal is the Pose Residual Network (PRN), designed to enhance pose estimation accuracy by efficiently assigning detected keypoints to identified person instances.

Methodology Overview

MultiPoseNet integrates several tasks into a cohesive framework. At its core, it employs a shared backbone based on ResNet with Feature Pyramid Networks (FPN) to extract features useful for subsequent stages. This shared backbone feeds into parallel subnets for detecting keypoints and person segments. The novelty of the arrangement lies in its ability to streamline multiple processes without significant degradation in performance or speed.

The PRN is noteworthy for its role in resolving ambiguities inherent in grouping keypoints. By employing a residual multilayer perceptron, the PRN considers all joints simultaneously, differentiating it from prior methods that focus primarily on pairwise or unary relations. This method adapts effectively to overlapping detections that commonly puzzle bottom-up approaches.

Experimental Results

MultiPoseNet exhibits impressive performance metrics when evaluated on the COCO dataset. The system achieves a noteworthy 4-point increase in mean Average Precision (mAP) over previous bottom-up methods, reaching parity with top-down methods but with substantially improved processing speed, achieving approximately 23 frames per second (FPS). The architecture’s comparative efficiency places it favorably among real-time systems.

The PRN further exhibits exceptional accuracy in assigning keypoints, showcasing improvements over other contemporary bottom-up grouping methodologies. Experiments on person detection and segmentation reaffirm its robustness, as the model outperforms existing methods in person-specific tasks.

Implications and Future Work

This research exemplifies the evolution of multi-task learning systems in effectively handling complex pose estimation tasks. The introduction of a unified architecture like MultiPoseNet represents a step forward in reducing computational costs while maintaining high performance metrics across multiple evaluation criteria. The adaptability of PRN in handling densely populated scenes speaks to its applicability in real-world scenarios.

Looking forward, there is potential for exploring variations in the backbone architecture to further boost performance and reduce computational overhead. Additionally, integrating more sophisticated segmentation models might improve accuracy in complex environments where individuals are partially obscured or closely packed.

The broader implications for AI systems include advancing real-time pose estimation capabilities in applications such as surveillance, human-computer interaction, and augmented reality. Continued optimization and the introduction of novel architectures hold promise for further advancements in this domain.

PDF Markdown

Related Papers

GitHub

GitHub - mkocabas/pose-residual-network: Code for the Pose Residual Network introduced in 'MultiPoseNet: Fast Multi-Person Pose Estimation using Pose Residual Network (ECCV 2018)' paper (346 stars)

Tweets

https://twitter.com/metu_imagelab/status/1039848815157813250