A Neural Network for Detailed Human Depth Estimation from a Single Image (1910.01275v2)

Published 3 Oct 2019 in cs.CV

Abstract: This paper presents a neural network to estimate a detailed depth map of the foreground human in a single RGB image. The result captures geometry details such as cloth wrinkles, which are important in visualization applications. To achieve this goal, we separate the depth map into a smooth base shape and a residual detail shape and design a network with two branches to regress them respectively. We design a training strategy to ensure both base and detail shapes can be faithfully learned by the corresponding network branches. Furthermore, we introduce a novel network layer to fuse a rough depth map and surface normals to further improve the final result. Quantitative comparison with fused `ground truth' captured by real depth cameras and qualitative examples on unconstrained Internet images demonstrate the strength of the proposed method. The code is available at https://github.com/sfu-gruvi-3dv/deep_human.

Citations (43)

View on Semantic Scholar

Summary

The paper introduces a dual-branch neural network that decouples base shape and high-frequency detail estimation to capture fine features like cloth wrinkles.
It employs a two-stage training strategy with a truncated L1 loss and a Normal-Net for surface normal integration to refine depth results.
Quantitative evaluations demonstrate that the method outperforms models like SURREAL and BodyNet by significantly reducing depth errors and improving 3D accuracy.

Detailed Depth Estimation of Humans from a Single RGB Image Using a Neural Network

The paper "A Neural Network for Detailed Human Depth Estimation from a Single Image" presents an innovative approach in computer vision, focusing on detailed depth map estimation of foreground humans from single RGB images. The researchers have proposed a neural network architecture designed to separate the low-frequency base shape and high-frequency detail shape for more precise depth estimation. This paper addresses the challenge of capturing fine geometry details such as cloth wrinkles, which is a significant improvement compared to existing methods primarily focused on coarse human body models.

Methodology Overview

The proposed architecture consists of a dual-branch network tasked with separately estimating a human depth map's base and detail components. The base shape branch targets the overall human body layout, whereas the detail shape branch focuses on capturing subtle geometrical features like cloth wrinkles. This decoupling is efficiently managed using a two-stage training strategy to ensure both branches can function optimally: pre-training the branches independently and fine-tuning them jointly. A novel insight in this work is the use of a truncated L1 loss during fine-tuning to enhance the consistency of these branches, preventing large errors from overwhelming the training process.

Furthermore, a Normal-Net is introduced to estimate surface normals, assisting in refining the composed depth results. By employing an iterative parameter-free shape refinement module inspired by classical methods, the system enhances the accuracy of the final depth output. Such a refinement layer effectively integrates surface normal data to resolve ambiguities in depth estimation.

Quantitative and Qualitative Evaluation

The proposed method is evaluated quantitatively against existing approaches such as SURREAL, BodyNet, and general depth estimation frameworks, demonstrating superior performance in capturing 3D geometric details. The authors conducted experiments on a custom dataset of detailed RGBD captures and fused meshes, showing that their method significantly reduces depth errors and improves the mean absolute error (MAE). They have highlighted through Cumulative Distribution Function (CDF) plots and accuracy metrics that their framework outperforms others, particularly in capturing high-precision details necessary for applications like telepresence.

Qualitatively, the network's efficacy is showcased through examples processed from both the custom dataset and various unconstrained internet images. These illustrate the network's generalizability and robustness in recovering detailed depth information from everyday imagery.

Implications and Future Directions

The implications of this research are manifold. Detailed depth estimation from single images can greatly enhance applications in fields like augmented reality (AR), virtual reality (VR), and telepresence, where accurate 3D models are crucial for immersive experiences. The ability to capture fine details such as cloth wrinkles can significantly improve tasks requiring high fidelity, such as digital fashion and fabric simulation.

Looking towards future research, extending this neural model to operate seamlessly across multiple views or integrating temporal consistency for video inputs could improve real-time applications. The realization of a fully parameter-free refinement layer also hints at broader applications in other contexts where surface normals can augment depth information.

Overall, the implementation of this research provides a substantial step forward in human depth estimation, effectively bridging the gap between sparse skeletal models and fully detailed 3D body representations from single images. This work lays a foundation upon which more sophisticated models can build, moving towards truly realistic human computer-interaction interfaces.

PDF Markdown

Related Papers

GitHub

GitHub - sfu-gruvi-3dv/deep_human: Code for iccv2019 paper "A Neural Network for Detailed Human Depth Estimation from a Single Image" (62 stars)

YouTube

Show All Videos