Dense Human Body Correspondences Using Convolutional Networks (1511.05904v2)

Published 18 Nov 2015 in cs.CV and cs.GR

Abstract: We propose a deep learning approach for finding dense correspondences between 3D scans of people. Our method requires only partial geometric information in the form of two depth maps or partial reconstructed surfaces, works for humans in arbitrary poses and wearing any clothing, does not require the two people to be scanned from similar viewpoints, and runs in real time. We use a deep convolutional neural network to train a feature descriptor on depth map pixels, but crucially, rather than training the network to solve the shape correspondence problem directly, we train it to solve a body region classification problem, modified to increase the smoothness of the learned descriptors near region boundaries. This approach ensures that nearby points on the human body are nearby in feature space, and vice versa, rendering the feature descriptor suitable for computing dense correspondences between the scans. We validate our method on real and synthetic data for both clothed and unclothed humans, and show that our correspondences are more robust than is possible with state-of-the-art unsupervised methods, and more accurate than those found using methods that require full watertight 3D geometry.

Citations (206)

View on Semantic Scholar

Summary

The paper introduces a CNN-based method that computes dense correspondences between 3D human scans with high accuracy even using partial data.
The approach employs ensemble classification for body region and dense label tasks, resulting in robust, smooth feature embeddings across the human body.
Experimental results show superior performance with average errors of 2.00 cm (intra-subject) and 2.35 cm (inter-subject) on the FAUST dataset, outperforming previous techniques.

Overview of "Dense Human Body Correspondences Using Convolutional Networks"

This paper presents an innovative approach to computing dense correspondences between 3D scans of human bodies using convolutional neural networks (CNNs). The proposed methodology stands out due to its ability to handle partial geometric information and its robustness to varying poses and clothing without requiring similar viewpoints between the scanned subjects. This research addresses a fundamental task in 3D computer vision with potential applications in motion tracking, shape analysis, and recognition.

Methodology

The authors propose a deep convolutional network architecture that builds feature descriptors for depth map pixels. A novel aspect of their approach is the detour from direct correspondence problem-solving. Instead, the network is trained to resolve a body region classification task, ensuring that both neighboring and distant points on the human body are mapped respectively to proximate and distant locations within a learned feature space. This transformation enhances descriptor suitability for computing dense correspondences.

The network employs a descriptor learning strategy as ensemble classification, allowing the integration of heterogeneous training datasets. This dual-phase learning involves two classification types: inter-subject key point classification and intra-subject dense label classification over a human body's segmentations. This comprehensive learning scheme ensures accurate, smooth embeddings across the feature space without necessitating the exhaustive computation of distances during training.

Experimental Validation

The proposed method is tested using both real and synthetic datasets, showcasing its performance on clothed and unclothed subjects. Comparatively, it yields superior results over current unsupervised methods, achieving robustness even with large-scale deformations, which were formerly challenging for state-of-the-art techniques. The authors demonstrate that their method effectively extends to real-time applications, including performance capture with a single RGB-D camera.

Results and Comparisons

Robust dense correspondences are established through this method, even when faced with challenges of partial data inputs or large deformations between models. When adapted to conventional matching systems, the proposed approach results in improved correspondence accuracy, as opposed to previous registration techniques. Notably, the method achieves an average error of 2.00 cm for intra-subject pairs and 2.35 cm for inter-subject pairs on the FAUST data set, outperforming comparable methodologies.

Conclusions and Future Prospects

The proposed strategy demonstrates significant improvements in both computational efficiency and correspondence accuracy for human body shapes, making it suitable for applications such as template-based performance capture in dynamic and deformable contexts. However, the method requires extensive training and depends substantially on the comprehensiveness of the training dataset.

Future work may focus on expanding the range of input diversity to include more varied poses and clothing types. Furthermore, improving outlier detection could enhance the overall robustness of the correspondence mapping. As the availability of annotated training data increases, leveraging deeper network architectures might further boost the predictive quality of the embeddings, paving the way for more generalized applications beyond the current scope.

This research marks a meaningful advancement in using deep learning methodologies to tackle noteworthy challenges within the domain of 3D shape correspondence, presenting both theoretical insights and substantial empirical evidence of its efficacy.

PDF Markdown