Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Learning in Multimodal Remote Sensing Data Fusion: A Comprehensive Review (2205.01380v1)

Published 3 May 2022 in cs.CV, cs.LG, and eess.SP

Abstract: With the extremely rapid advances in remote sensing (RS) technology, a great quantity of Earth observation (EO) data featuring considerable and complicated heterogeneity is readily available nowadays, which renders researchers an opportunity to tackle current geoscience applications in a fresh way. With the joint utilization of EO data, much research on multimodal RS data fusion has made tremendous progress in recent years, yet these developed traditional algorithms inevitably meet the performance bottleneck due to the lack of the ability to comprehensively analyse and interpret these strongly heterogeneous data. Hence, this non-negligible limitation further arouses an intense demand for an alternative tool with powerful processing competence. Deep learning (DL), as a cutting-edge technology, has witnessed remarkable breakthroughs in numerous computer vision tasks owing to its impressive ability in data representation and reconstruction. Naturally, it has been successfully applied to the field of multimodal RS data fusion, yielding great improvement compared with traditional methods. This survey aims to present a systematic overview in DL-based multimodal RS data fusion. More specifically, some essential knowledge about this topic is first given. Subsequently, a literature survey is conducted to analyse the trends of this field. Some prevalent sub-fields in the multimodal RS data fusion are then reviewed in terms of the to-be-fused data modalities, i.e., spatiospectral, spatiotemporal, light detection and ranging-optical, synthetic aperture radar-optical, and RS-Geospatial Big Data fusion. Furthermore, We collect and summarize some valuable resources for the sake of the development in multimodal RS data fusion. Finally, the remaining challenges and potential future directions are highlighted.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jiaxin Li (57 papers)
  2. Danfeng Hong (65 papers)
  3. Lianru Gao (16 papers)
  4. Jing Yao (56 papers)
  5. Ke Zheng (3 papers)
  6. Bing Zhang (435 papers)
  7. Jocelyn Chanussot (89 papers)
Citations (285)

Summary

Deep Learning in Multimodal Remote Sensing Data Fusion: A Comprehensive Review

The paper "Deep Learning in Multimodal Remote Sensing Data Fusion: A Comprehensive Review," provides an in-depth analysis of the integration of deep learning (DL) with multimodal remote sensing (RS) data fusion. The authors present a comprehensive review aligning with the increasing availability of heterogeneous Earth observation (EO) data derived from advanced RS technologies. As traditional multimodal RS data fusion methodologies confront performance limitations due to their inadequate capacity to analyze and interpret the heterogeneity of this data, this review shifts the focus to the superior performance of DL techniques.

Key Contributions and Methodologies

The authors meticulously categorize the existing approaches into homogeneous and heterogeneous fusion techniques. Homogeneous fusion encompasses spatiospectral fusion (e.g., pansharpening, hyperspectral (HS) pansharpening, and hyperspectral-multispectral (HS-MS) fusion) and spatiotemporal fusion. In contrast, heterogeneous fusion explores light detection and ranging (LiDAR) alongside optical data, synthetic aperture radar (SAR) with optical data, and the fusion of RS with geospatial big data (GBD).

Homogeneous Fusion

  • Pansharpening: This encompasses the fusion of multispectral (MS) and panchromatic (Pan) images to generate high-resolution outputs. Various DL architectures such as autoencoders (AE), convolutional neural networks (CNN), and generative adversarial networks (GAN) are employed to overcome limitations of traditional alternatives.
  • HS Pansharpening and HS-MS Fusion: These methodologies leverage DL to fuse HS and Pan data to preserve spectral fidelity while enhancing spatial resolution. Techniques include both supervised and unsupervised models, with the introduction of hybrid networks combining traditional methodologies with DL capabilities.
  • Spatiotemporal Fusion: This technique addresses temporal resolution discrepancies by utilizing DL frameworks, such as CNN and GAN, to model spatial-temporal dependencies and generate temporally dense high-resolution reflectance products.

Heterogeneous Fusion

  • LiDAR-Optical Fusion: The paper emphasizes DL architectures that effectively combine spectral HS data with LiDAR, enhancing performance in land-use and land-cover (LULC) classification by integrating 3-D spatial geometry with spectral information.
  • SAR-Optical and RS-GBD Fusion: The integration focuses on leveraging the complementary characteristics of SAR and optical images. The fusion of RS with GBD like Points of Interest (POI) and user-generated content is highlighted for urban functional zone classification.

Challenges and Prospects

The paper outlines critical challenges in the field, chiefly related to image registration, quality assessment, and interpretability of DL models. The focus on image registration underscores the necessity for coordinated preprocessing steps in fusion scenarios. The discussion on quality assessment indicates an insufficient emphasis on application-oriented evaluation metrics, urging future research to develop more precise and relevant quality indicators. The authors also advocate for advancements in interpretable DL models, emphasizing the need for methods that increase transparency and understanding of the DL models' internal workings during fusion tasks.

Looking ahead, it is apparent that the field could benefit from transitioning toward crossmodal learning paradigms. By enabling models to work effectively with incomplete data sets, such a shift would enhance the potential applications and deployability of multimodal RS data fusion technologies.

Conclusion

The paper consolidates the pivotal role of DL in unlocking the potential of multimodal RS data fusion, proposing a roadmap for future research in overcoming existing challenges. By providing detailed analyses and comprehensive recommendations, this comprehensive review paper serves as an invaluable resource for experienced researchers seeking to both understand and further innovate within the domain of multimodal RS data fusion using deep learning technologies. Through robust scholarly discussion, it lays down the groundwork for the evolution of RS data fusion paradigms in this highly dynamic research area.