Identifying Corresponding Patches in SAR and Optical Images with a Pseudo-Siamese CNN (1801.08467v1)

Published 25 Jan 2018 in eess.IV and cs.CV

Abstract: In this letter, we propose a pseudo-siamese convolutional neural network (CNN) architecture that enables to solve the task of identifying corresponding patches in very-high-resolution (VHR) optical and synthetic aperture radar (SAR) remote sensing imagery. Using eight convolutional layers each in two parallel network streams, a fully connected layer for the fusion of the features learned in each stream, and a loss function based on binary cross-entropy, we achieve a one-hot indication if two patches correspond or not. The network is trained and tested on an automatically generated dataset that is based on a deterministic alignment of SAR and optical imagery via previously reconstructed and subsequently co-registered 3D point clouds. The satellite images, from which the patches comprising our dataset are extracted, show a complex urban scene containing many elevated objects (i.e. buildings), thus providing one of the most difficult experimental environments. The achieved results show that the network is able to predict corresponding patches with high accuracy, thus indicating great potential for further development towards a generalized multi-sensor key-point matching procedure. Index Terms-synthetic aperture radar (SAR), optical imagery, data fusion, deep learning, convolutional neural networks (CNN), image matching, deep matching

Citations (166)

View on Semantic Scholar

Summary

The paper proposes a pseudo-siamese CNN with separate streams for SAR and optical data to learn and identify corresponding image patches.
A novel dataset generation approach is used based on the "SARptical" framework, enabling reliable ground-truth creation for training the model.
Experimental evaluation shows promising accuracy for the method, demonstrating a significant advancement in multi-sensor data integration for remote sensing applications.

Identifying Corresponding Patches in SAR and Optical Images with a Pseudo-Siamese CNN

The paper under discussion presents a method for identifying corresponding patches in synthetic aperture radar (SAR) and optical images via a pseudo-siamese convolutional neural network (CNN) architecture. The research addresses the longstanding challenge in remote sensing of matching images from different sensor modalities, specifically SAR and optical imagery, which have inherently distinct characteristics due to differing physical and chemical properties. This distinction makes direct comparison and matching a non-trivial pursuit, primarily when dealing with very-high-resolution (VHR) data in complex urban environments.

Network Architecture and Training

The proposed solution utilizes a pseudo-siamese CNN, diverging from conventional siamese networks by maintaining separate yet identical convolutional streams for SAR and optical images — instead of sharing weights across layers. This design choice capitalizes on the unique feature characteristics extracted from each modality before merging them at a decision fusion stage. The architecture consists of eight convolutional layers per stream, fused into a fully connected layer and optimized through a binary cross-entropy loss function. Through training, this model learns to output a binary indication of patch correspondence.

The network design implements small $3 \times 3$ filters, with non-overlapping max-pooling layers, aiming to enhance non-linearity while maintaining spatial feature integrity. This configuration differs from larger filter approaches, enabling the network to potentially become more discriminative as it captures distinct directional patterns. Data preprocessing includes batch normalization and dropout strategies to reduce overfitting and enhance training efficiency.

Dataset and Methodology

A significant contribution of the paper is the dataset generation approach, leveraging the "SARptical" framework developed for 3D point cloud matching. The data pool consists of automatically generated SAR and optical patches based on an intricate alignment process, enabling reliable ground-truth correspondence derivation without manual annotation. The dataset is partitioned to ensure robust evaluation metrics, representative of real-world applications.

Experimental Evaluation

The paper conducts evaluations using various patch sizes, analyzing the impact on network performance. Results indicate a clear trade-off between patch size and discriminative power, with larger patches yielding higher accuracy due to the inclusion of more context, especially important given the SAR's range-based distortions.

The experimental findings report a notable accuracy potential for the pseudo-siamese CNN approach, achieving commendable correspondence prediction rates on a partitioned test set. However, discrepancies in matching accuracy are observed under certain conditions of extreme visual disparities, such as severe layover effects in SAR images.

Implications and Future Directions

The proposed method demonstrates significant advancement in multi-sensor data integration, offering a novel pathway toward automatic and scalable image correspondence matching. While showing promising results, the research opens avenues for further exploration, such as expanding the dataset with various sensor types or exploring integration into dense matching scenarios. These developments could enhance the methodology's applicability in broader remote sensing tasks beyond isolated key-point matching.

In conclusion, this research contributes a methodologically sound and technically innovative approach for cross-sensor image matching, utilizing deep learning's potential to tackle the intricacies of multi-modal remote sensing data. The insights and evidence presented lay groundwork for further exploration into generalized SAR-optical image matching procedures, signifying strides towards more comprehensive and efficient remote sensing image analysis.