GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints (1807.06294v2)

Published 17 Jul 2018 in cs.CV

Abstract: Learned local descriptors based on Convolutional Neural Networks (CNNs) have achieved significant improvements on patch-based benchmarks, whereas not having demonstrated strong generalization ability on recent benchmarks of image-based 3D reconstruction. In this paper, we mitigate this limitation by proposing a novel local descriptor learning approach that integrates geometry constraints from multi-view reconstructions, which benefits the learning process in terms of data generation, data sampling and loss computation. We refer to the proposed descriptor as GeoDesc, and demonstrate its superior performance on various large-scale benchmarks, and in particular show its great success on challenging reconstruction tasks. Moreover, we provide guidelines towards practical integration of learned descriptors in Structure-from-Motion (SfM) pipelines, showing the good trade-off that GeoDesc delivers to 3D reconstruction tasks between accuracy and efficiency.

Citations (167)

View on Semantic Scholar

Summary

Overview of "GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints"

The paper "GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints" presents an innovative approach for learning robust local descriptors through the integration of geometry constraints from multi-view reconstructions. These descriptors, referred to as GeoDesc, have demonstrated enhanced performance across a variety of large-scale benchmarks and show particular promise in demanding 3D reconstruction tasks. The significance of this research lies in its potential to improve the accuracy and efficiency of local descriptor applications in computer vision, such as panorama stitching, wide baseline matching, and 3D structure-from-motion (SfM).

Methodology

The paper addresses the limitation of poor generalization ability of CNN-based local descriptors on image-based 3D reconstruction benchmarks. The authors propose a novel learning framework that incorporates geometry constraints to bolster the learning process in three key areas: data generation, sampling, and loss computation. This involves the use of geometry constraints obtained from multi-view reconstructions to create a pipeline for the generation of well-annotated training data, improve data sampling efficiency, and formulate a robust loss function that mitigates overfitting.

The proposed network architecture is based on the one used in L2-Net, adapted to include geometric loss terms which enhance performance by using geometry-based patch similarity measures. This involves estimation of geometric similarity, leveraging the multi-view stereo accuracy measurement to determine patch and image similarities, which are crucial for data sampling and loss formulation.

Empirical Results

The empirical evaluation of GeoDesc was conducted on several benchmarks including the patch-based HPatches, image-based Heinly benchmark, and the ETH local features benchmark for 3D reconstruction tasks. The results indicate that GeoDesc surpasses both traditional hand-crafted descriptors like SIFT and RSIFT, as well as recent learned ones like L2-Net and HardNet in accuracy and robustness, particularly excelling under significant photometric and geometric variations.

Notably, in the HPatches benchmark, GeoDesc exhibited significantly superior performance in both matching score and recall compared to existing descriptors. It particularly demonstrates strong invariant properties to geometric transformations, underscoring its efficacy in practical image matching applications. The ETH benchmark further validates its utility in SfM, where GeoDesc achieved higher statistics such as the number of registered images and reconstructed sparse points compared to other state-of-the-art descriptors.

Practical Guidelines and Future Implications

The authors provide insightful practical guidelines for integrating GeoDesc in SfM pipelines, notably in determining the ratio criterion and emphasizing the feature descriptor's compactness and scalability. This includes conducting a compactness paper through PCA to understand the explained variance and advocating for the optimal utilization of the ratio test as in conventional SIFT-based pipelines.

Understanding the implications of GeoDesc is critical for future developments in AI, especially in domains where precise localization and mapping are essential, like autonomous navigation and augmented reality. The integration of geometry constraints as proposed could inform future work in developing even more generalizable descriptors, with potential applications in broader computer vision tasks beyond reconstruction, such as semantic matching and recognition.

In conclusion, GeoDesc represents a meaningful step forward in the field of learned local descriptors, offering significant improvements in both accuracy and efficiency. By better heightening the synergistic relationship between geometric constraint integration and descriptor learning, this research sets a promising precedent for future innovations in computer vision.