Multi-view dense image matching with similarity learning and geometry priors (2505.11264v1)

Published 16 May 2025 in cs.CV

Abstract: We introduce MV-DeepSimNets, a comprehensive suite of deep neural networks designed for multi-view similarity learning, leveraging epipolar geometry for training. Our approach incorporates an online geometry prior to characterize pixel relationships, either along the epipolar line or through homography rectification. This enables the generation of geometry-aware features from native images, which are then projected across candidate depth hypotheses using plane sweeping. Our method geometric preconditioning effectively adapts epipolar-based features for enhanced multi-view reconstruction, without requiring the laborious multi-view training dataset creation. By aggregating learned similarities, we construct and regularize the cost volume, leading to improved multi-view surface reconstruction over traditional dense matching approaches. MV-DeepSimNets demonstrates superior performance against leading similarity learning networks and end-to-end regression models, especially in terms of generalization capabilities across both aerial and satellite imagery with varied ground sampling distances. Our pipeline is integrated into MicMac software and can be readily adopted in standard multi-resolution image matching pipelines.

Authors (4)

Mohamed Ali Chebbi (2 papers)
Ewelina Rupnik (8 papers)
Paul Lopes (2 papers)
Marc Pierrot-Deseilligny (6 papers)

Summary

Multi-view Dense Image Matching with Similarity Learning and Geometry Priors

This paper presents a novel approach to multi-view dense image matching by leveraging deep neural networks designed for multi-view similarity learning. The method introduces significant innovations by integrating geometry priors, specifically epipolar geometry and homography, combined with deep learning techniques to enhance 3D surface reconstruction from aerial and satellite imagery.

Methodology and Contributions

Similarity Learning Framework: The proposed framework relies on a suite of deep neural networks for multi-view similarity learning. Training is conducted using epipolar geometry, which characterizes pixel relationships either along the epipolar line or through homography rectification. This approach generates geometry-aware features from native images without the need for laborious multi-view training dataset creation.
Geometry Priors: The integration of online geometry priors into the framework allows the effective adaptation of epipolar-based features for enhanced multi-view reconstruction. This incorporation is crucial for removing rotational ambiguity and simplifying matching tasks across multiple viewpoints.
Plane Sweeping Technique: At the inference stage, plane sweeping is employed to project these geometry-aware features across various candidate depth hypotheses. This contributes to the construction and regularization of the cost volume, fundamentally improving multi-view surface reconstruction when compared to traditional dense matching approaches.
Model Architecture and Training: The research contrasts different architectures such as MS-AFF, U-Net, and attention U-Net variants—each exhibiting varying levels of expressivity and generalizability. These models engage in self-supervised feature learning, capturing matching and non-matching samples without direct reliance on dense ground truth disparity maps.
Integration with MicMac: The pipeline seamlessly integrates into the existing MicMac software, making it adaptable for standard multi-resolution image matching pipelines. This integration indicates the scalability and applicability of the methodology for practical use in normal workflows.

Results and Evaluation

The method demonstrates superior performance in terms of generalization capabilities across both aerial and satellite imagery with varied ground sampling distances. Detailed quantitative evaluations reveal that the approach outperforms existing similarity learning networks on several metrics, including accuracy, robustness, and completeness. The learned representations prove to be highly transferrable to unseen landscapes, emphasizing the setup's efficacy in large-scale 3D reconstruction projects.

Implications and Future Directions

The paper affirms the practical importance of incorporating geometry priors into multi-view image matching. By integrating these priors into similarity learning frameworks, the research facilitates more robust and adaptable solutions for handling multi-view configurations. The insightful findings pave the way for the development of new architectures that efficiently exploit geometry-awareness in stereopsis tasks.

Moreover, the paper opens avenues for further research on refining geometric representations using neural network architectures. Future work might explore more sophisticated geometric transformations, potentially utilizing novel deep learning paradigms that embed spatial understanding even further into model training. Such developments could substantially enhance model performance, particularly in diverse environmental and geographic contexts.

In conclusion, this research provides a comprehensive examination of multi-view dense image matching by integrating neural similarity learning and geometry priors. It represents a significant step forward in the field of 3D surface reconstruction by employing nuanced deep learning strategies combined with geometric insights.

Related Papers

Find Related Papers

YouTube

Show All Videos