DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild (1803.02188v2)

Published 5 Mar 2018 in cs.CV

Abstract: In this work we use deep learning to establish dense correspondences between a 3D object model and an image "in the wild". We introduce "DenseReg", a fully-convolutional neural network (F-CNN) that densely regresses at every foreground pixel a pair of U-V template coordinates in a single feedforward pass. To train DenseReg we construct a supervision signal by combining 3D deformable model fitting and 2D landmark annotations. We define the regression task in terms of the intrinsic, U-V coordinates of a 3D deformable model that is brought into correspondence with image instances at training time. A host of other object-related tasks (e.g. part segmentation, landmark localization) are shown to be by-products of this task, and to largely improve thanks to its introduction. We obtain highly-accurate regression results by combining ideas from semantic segmentation with regression networks, yielding a 'quantized regression' architecture that first obtains a quantized estimate of position through classification, and refines it through regression of the residual. We show that such networks can boost the performance of existing state-of-the-art systems for pose estimation. Firstly, we show that our system can serve as an initialization for Statistical Deformable Models, as well as an element of cascaded architectures that jointly localize landmarks and estimate dense correspondences. We also show that the obtained dense correspondence can act as a source of 'privileged information' that complements and extends the pure landmark-level annotations, accelerating and improving the training of pose estimation networks. We report state-of-the-art performance on the challenging 300W benchmark for facial landmark localization and on the MPII and LSP datasets for human pose estimation.

Citations (199)

View on Semantic Scholar

Summary

The paper introduces DenseReg, a framework that maps every pixel to 3D U-V coordinates for dense 2D-to-3D correspondence in one forward pass.
The paper presents a novel quantized regression architecture combining classification with residual regression to refine predictions and enhance landmark detection.
The paper demonstrates state-of-the-art performance on benchmarks like 300W, MPII, and LSP, offering valuable insights for advanced pose estimation and spatial analysis.

DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild

The paper "DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild" presents a comprehensive paper on using a fully convolutional neural network (F-CNN) to establish dense correspondences between 3D object models and images captured “in the wild.” This work introduces the DenseReg system, which achieves dense shape regression by mapping every foreground pixel of an image to a pair of U-V template coordinates, thus enabling a dense 2D-to-3D correspondence in a single feedforward pass.

DenseReg is trained using a novel supervision signal obtained through 3D deformable model fitting combined with 2D landmark annotations. The authors define the regression task in the context of the intrinsic U-V coordinates of a 3D model connected at training time with image instances. A salient feature of DenseReg is its ability to perform related tasks, such as part segmentation and landmark localization, as by-products of the primary regression task, demonstrating considerably improved performance across these tasks.

The authors propose a “quantized regression” architecture, which effectively combines classification and regression. This approach first estimates a quantized position through classification and subsequently refines it using regression of the residual. Such a structure allows the network to improve existing state-of-the-art systems for pose estimation, serving as an initialization or an element in cascaded architectures for comprehensive landmark localization and dense correspondence estimation.

Experimentally, DenseReg is demonstrated to enhance the training of pose estimation networks when used as a source of ‘privileged information’ that complements pure landmark-level annotations. Performance is benchmarked, showing state-of-the-art results on the 300W benchmark for facial landmark localization and MPII and LSP datasets for human pose estimation.

DenseReg's practical implications extend to various domains within computer vision, particularly in applications involving complex scene analysis and object recognition tasks requiring detailed spatial understanding. Theoretical contributions include advancing CNN-based dense correspondence strategies that could be adapted for non-rigid object matching and other shape-awareness tasks in computer vision.

In summary, the DenseReg paper offers a significant advancement in dense shape regression, providing a framework that bridges 2D and 3D spaces through efficient CNN architectures. This research opens avenues for future developments in AI by enhancing model capabilities in understanding and mapping complex shapes from real-world images, setting a foundation for further explorations in automated visual recognition and spatial analysis.

PDF Markdown

DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild (1803.02188v2)

Summary

DenseReg: Fully Convolutional Dense Shape Regression In-the-Wild

Related Papers