Synthetic Occlusion Augmentation with Volumetric Heatmaps for the 2018 ECCV PoseTrack Challenge on 3D Human Pose Estimation (1809.04987v3)

Published 13 Sep 2018 in cs.CV and cs.RO

Abstract: In this paper we present our winning entry at the 2018 ECCV PoseTrack Challenge on 3D human pose estimation. Using a fully-convolutional backbone architecture, we obtain volumetric heatmaps per body joint, which we convert to coordinates using soft-argmax. Absolute person center depth is estimated by a 1D heatmap prediction head. The coordinates are back-projected to 3D camera space, where we minimize the L1 loss. Key to our good results is the training data augmentation with randomly placed occluders from the Pascal VOC dataset. In addition to reaching first place in the Challenge, our method also surpasses the state-of-the-art on the full Human3.6M benchmark among methods that use no additional pose datasets in training. Code for applying synthetic occlusions is availabe at https://github.com/isarandi/synthetic-occlusion.

Citations (40)

View on Semantic Scholar

Summary

The paper introduces a novel synthetic occlusion augmentation technique using volumetric heatmaps and soft-argmax for improved 3D human pose estimation.
The method employs a fully-convolutional network with synthetic occlusions from Pascal VOC to robustly predict 3D joint positions in occluded scenarios.
Empirical evaluations on ECCV PoseTrack and Human3.6M benchmarks demonstrate superior accuracy with reduced mean per joint position error.

Analysis of "Synthetic Occlusion Augmentation with Volumetric Heatmaps for the 2018 ECCV PoseTrack Challenge on 3D Human Pose Estimation"

The paper "Synthetic Occlusion Augmentation with Volumetric Heatmaps for the 2018 ECCV PoseTrack Challenge on 3D Human Pose Estimation" presents a comprehensive approach towards enhancing 3D human pose estimation mechanisms. Utilizing a fully-convolutional network architecture, the authors achieve notable results in the 2018 ECCV PoseTrack Challenge, obtaining exemplary improvements over existing methodologies.

Methodology

The method involves the use of volumetric heatmaps to predict the position of body joints in 3D space. This is achieved by leveraging a backbone network to output these heatmaps which are translated into joint coordinates through the use of soft-argmax. This softmax-based approach obviates the necessity for explicit ground-truth heatmaps during training, offering a more refined, low-memory intensive solution.

The training process is enhanced by a novel data augmentation technique centered on occlusion augmentation. This involves augmenting image data with synthetic occlusions derived from the Pascal VOC dataset, which aids in regularizing the model to better handle real-world occlusions encountered in 3D human pose estimation tasks. The focus on occlusion robustness is particularly pertinent given the challenge's dataset characteristics, which lack occluder-free bounding boxes and camera intrinsic information, complicating the estimation task.

Results and Implications

Empirical evaluation underscores the efficacy of the proposed method, which achieved superior performance in the 2018 ECCV PoseTrack Challenge by surpassing other competing methods. The mean per joint position error was used as a key performance metric, highlighting the robustness of the method against various activities, especially those involving significant occlusions. Notably, the approach excelled in actions such as Sitting and Sitting Down, corroborating the superiority of synthetic occlusion augmentation.

The results also extend to the comprehensive Human3.6M benchmark where the method outperforms other state-of-the-art methodologies that do not employ additional pose datasets for training. This indicates the effectiveness of the proposed approach in generalizing well in scenarios with limited labeled data.

Practical and Theoretical Implications

The presented approach holds significant implications for the development of enhanced human-centric applications in areas like human-robot interaction and virtual reality, where precise joint localization is crucial. The synthetic augmentation technique is a forward-looking contribution poised to tackle the prevalent challenge of occlusions in pose recognition.

Theoretically, leveraging an end-to-end fully differentiable training regime facilitated by volumetric heatmaps and soft-argmax operations sheds new light on the compact and memory-efficient design of neural network architectures for dense prediction tasks.

Future Directions

Future research could explore the adaptability of this synthetic occlusion technique to other domains, particularly in multi-person scenarios or real-time applications. Moreover, efforts could focus on integrating more sophisticated data augmentation strategies or exploring alternative architectures that further minimize dependence on large volumes of labeled data.

In conclusion, the paper presents a meticulous paper into enhancing 3D human pose estimation, with the proposed synthetic occlusion augmentation strategy fundamentally improving robustness across complex occlusion scenarios. This research constitutes a significant contribution to the growing body of work focused on advancing 3D human pose estimation technologies.

PDF Markdown

Related Papers

GitHub

GitHub - isarandi/synthetic-occlusion: Synthetic Occlusion Augmentation (115 stars)

Tweets

https://twitter.com/shahabks/status/1210009284940926983