Weakly and Semi Supervised Human Body Part Parsing via Pose-Guided Knowledge Transfer (1805.04310v1)

Published 11 May 2018 in cs.CV

Abstract: Human body part parsing, or human semantic part segmentation, is fundamental to many computer vision tasks. In conventional semantic segmentation methods, the ground truth segmentations are provided, and fully convolutional networks (FCN) are trained in an end-to-end scheme. Although these methods have demonstrated impressive results, their performance highly depends on the quantity and quality of training data. In this paper, we present a novel method to generate synthetic human part segmentation data using easily-obtained human keypoint annotations. Our key idea is to exploit the anatomical similarity among human to transfer the parsing results of a person to another person with similar pose. Using these estimated results as additional training data, our semi-supervised model outperforms its strong-supervised counterpart by 6 mIOU on the PASCAL-Person-Part dataset, and we achieve state-of-the-art human parsing results. Our approach is general and can be readily extended to other object/animal parsing task assuming that their anatomical similarity can be annotated by keypoints. The proposed model and accompanying source code are available at https://github.com/MVIG-SJTU/WSHP

Authors (6)

Hao-Shu Fang (38 papers)
Guansong Lu (20 papers)
Xiaolin Fang (14 papers)
Jianwen Xie (52 papers)
Yu-Wing Tai (123 papers)
Cewu Lu (203 papers)

Citations (120)

View on Semantic Scholar

Summary

The paper introduces a method that transfers segmentation information via pose similarity to reduce reliance on dense annotations.
It employs pose cluster discovery, guided morphing, and refinement to generate reliable annotation proxies from keypoint data.
The approach achieves a 6% mIoU improvement on the PASCAL-Person-Part dataset and demonstrates versatility across multiple categories.

Weakly and Semi Supervised Human Body Part Parsing via Pose-Guided Knowledge Transfer

The paper presents a methodology to enhance human body part parsing by leveraging pose information for semi-supervised learning. Human part parsing, or semantic segmentation of the human body, is a convergence point for various computer vision tasks, such as action recognition, human-computer interaction, and object recognition. Typically, this task requires large amounts of densely annotated pixel-level data, which is expensive and labor-intensive to obtain. This research introduces a novel approach by using easily obtainable keypoint annotations to approximate part segmentations and enhance the training data pool.

Methodology

The approach relies on anatomical similarities between humans. Specifically, the paper proposes transferring segmentation information between individuals with similar poses. To operationalize this, the authors introduce a mechanism called "pose-guided knowledge transfer," which includes several steps:

Pose Cluster Discovery: The method first identifies individuals with similar poses by comparing Euclidean distances between keypoint annotations in a normalized space.
Pose-Guided Morphing and Prior Generation: After clustering similar poses, the part segmentation data from clustered individuals undergoes a transformation to align with the target individual’s pose using an affine transformation. This leads to the creation of a part-level prior, which serves as a strong annotation proxy.
Prior Refinement: A refinement network, inspired by "U-Net" architectures, takes these priors and refines the segmentations using the input image, ensuring pixel-wise segmentation accuracy based on contextual and local image information.
Semi-Supervised Parsing Network Training: This enhanced annotation data expands the training set for the parsing network, leading to improved semantic segmentation outcomes.

Results

The results demonstrate a substantial improvement in the performance of deep learning models for semantic part segmentation. The proposed method improved the mean Intersection-over-Union (mIoU) by 6 percentage points on the PASCAL-Person-Part dataset compared to strongly-supervised models, achieving an mIoU of 62.60. The research also extends its approach to dataset tasks such as horses and cows, demonstrating its versatility across categories that can be annotated with keypoints. The integration of additional training data synthesized using this framework allowed achieving state-of-the-art results.

Implications and Future Directions

This research underscores the potential of utilizing existing annotations (keypoints in this context) across datasets to economize the costly task of semantic segmentation. It suggests that similar methodologies could be leveraged in tasks involving other anatomically defined entities, given that their structure can be captured via keypoints.

The theoretical implications are important, highlighting the potential of knowledge transfer in deep learning, specifically how high-dimensional feature transformations (such as pose morphing) can aid training processes. Practically, it reduces reliance on densely annotated datasets, facilitating more widespread application in diverse environments where part annotations are scarce.

Future developments may build on this knowledge transfer methodology to include more complex transformations for robustness. There’s also the potential of integrating motion dynamics in temporal datasets to predict part segmentations over sequences, thus enhancing real-time applications in video-based analytics.

In conclusion, this research marks a significant step in reducing the dependence on annotated datasets for complex vision tasks, broadening the applicability and efficiency of semantic segmentation models in computer vision.

PDF Markdown

Related Papers

GitHub

GitHub - MVIG-SJTU/WSHP: Code for CVPR'18 spotlight "Weakly and Semi Supervised Human Body Part Parsing via Pose-Guided Knowledge Transfer" (300 stars)