Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions (2004.10566v1)

Published 22 Apr 2020 in cs.CV

Abstract: In this work we target the problem of estimating accurately localised correspondences between a pair of images. We adopt the recent Neighbourhood Consensus Networks that have demonstrated promising performance for difficult correspondence problems and propose modifications to overcome their main limitations: large memory consumption, large inference time and poorly localised correspondences. Our proposed modifications can reduce the memory footprint and execution time more than $10\times$, with equivalent results. This is achieved by sparsifying the correlation tensor containing tentative matches, and its subsequent processing with a 4D CNN using submanifold sparse convolutions. Localisation accuracy is significantly improved by processing the input images in higher resolution, which is possible due to the reduced memory footprint, and by a novel two-stage correspondence relocalisation module. The proposed Sparse-NCNet method obtains state-of-the-art results on the HPatches Sequences and InLoc visual localisation benchmarks, and competitive results in the Aachen Day-Night benchmark.

Citations (157)

View on Semantic Scholar

Summary

The paper introduces a sparse CNN framework that reduces computational demands over 10-fold while maintaining high matching performance.
It employs a two-stage correspondence relocalisation mechanism to refine match coordinates from grid-level estimations to sub-pixel accuracy.
The method achieves superior results on benchmarks like HPatches and InLoc, promising advances for real-time visual localization and 3D reconstruction.

Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions

The paper presents Sparse-NCNet, an innovative approach to image matching that addresses key limitations of the Neighbourhood Consensus Networks (NCNet) by incorporating submanifold sparse convolutions. The focus lies on improving memory efficiency and reducing the inference time while enhancing matching accuracy. The primary contribution is the significant reduction in computational demands through the sparsity of correlation tensors and optimization of processing via a 4D convolutional neural network (CNN) that employs submanifold sparse convolutions. This results in more than a 10-fold reduction in memory footprint and execution time without compromising performance. Furthermore, the introduction of a two-stage correspondence relocalisation mechanism enhances the localisation precision of matches.

Methodology

Sparse-NCNet operates by selectively retaining only the most promising matches within the correlation tensor, which are efficiently processed using a sparse CNN framework. The use of submanifold sparse convolutions ensures that the sparsity of the data is preserved, avoiding unnecessary computational complexity. Additionally, the correlation tensor is enhanced with a permutation-invariant CNN, improving robustness by effectively propagating information within local neighborhoods.

To address the challenge of poorly localized correspondences, Sparse-NCNet implements a novel relocalisation mechanism. It begins with a hard relocalisation step that refines match coordinates through a regional optimization of match likelihood within a quadrupled grid resolution. This is followed by a soft relocalisation step leveraging softargmax to achieve sub-pixel accuracy, enhancing the practical applicability of the matches in high-precision tasks such as visual localisation and 3D reconstruction.

Results

The effectiveness of Sparse-NCNet is validated across several benchmarks, namely HPatches Sequences, InLoc, and Aachen Day-Night, demonstrating superior or comparable performance to previous methods. Sparse-NCNet outperforms the state-of-the-art in the HPatches Sequences benchmark, particularly excelling in addressing both viewpoint and illumination variances. The significant improvements in the computational efficiency make real-time applications more feasible. On the InLoc benchmark for indoor localisation, Sparse-NCNet sets a new record for accuracy, reaffirming that the combined feature extraction, matching, and filtering from a single pipeline provides robust solutions. On the Aachen Day-Night benchmark, Sparse-NCNet achieves results on par with domain-leading techniques, navigating the challenging task of day-night imagery localization effectively.

Implications and Future Work

Sparse-NCNet represents a substantial leap forward in leveraging sparse representations within CNN architectures for image matching, offering an advantageous balance between computational efficiency and matching performance. This approach not only satisfies current demands for real-time processing in resource-constrained environments but also opens up new possibilities for large-scale 3D reconstruction and real-time navigation systems.

Future research directions involve exploring the integration of these sparse convolutional networks in other domains such as video processing and extending the model to accommodate multispectral data for improved robustness across diverse environmental conditions. Additionally, potential developments could involve further enhancements in the relocalisation strategy, potentially integrating more advanced interpolation techniques to overcome any residual localization limitations.

PDF Markdown

Related Papers

YouTube

Show All Videos