Cross-City Matters: A Multimodal Remote Sensing Benchmark Dataset for Cross-City Semantic Segmentation using High-Resolution Domain Adaptation Networks (2309.16499v2)

Published 26 Sep 2023 in cs.CV and eess.IV

Abstract: AI approaches nowadays have gained remarkable success in single-modality-dominated remote sensing (RS) applications, especially with an emphasis on individual urban environments (e.g., single cities or regions). Yet these AI models tend to meet the performance bottleneck in the case studies across cities or regions, due to the lack of diverse RS information and cutting-edge solutions with high generalization ability. To this end, we build a new set of multimodal remote sensing benchmark datasets (including hyperspectral, multispectral, SAR) for the study purpose of the cross-city semantic segmentation task (called C2Seg dataset), which consists of two cross-city scenes, i.e., Berlin-Augsburg (in Germany) and Beijing-Wuhan (in China). Beyond the single city, we propose a high-resolution domain adaptation network, HighDAN for short, to promote the AI model's generalization ability from the multi-city environments. HighDAN is capable of retaining the spatially topological structure of the studied urban scene well in a parallel high-to-low resolution fusion fashion but also closing the gap derived from enormous differences of RS image representations between different cities by means of adversarial learning. In addition, the Dice loss is considered in HighDAN to alleviate the class imbalance issue caused by factors across cities. Extensive experiments conducted on the C2Seg dataset show the superiority of our HighDAN in terms of segmentation performance and generalization ability, compared to state-of-the-art competitors. The C2Seg dataset and the semantic segmentation toolbox (involving the proposed HighDAN) will be available publicly at https://github.com/danfenghong.

Citations (277)

View on Semantic Scholar

Summary

The paper introduces the C2Seg benchmark and the HighDAN network as innovative solutions to overcome domain generalization challenges in cross-city semantic segmentation.
It employs a unique combination of multimodal data integration and adversarial domain adaptation strategies to capture detailed urban semantics across cities.
Comprehensive experiments demonstrate significant improvements in accuracy, mean IoU, and F1 scores compared to state-of-the-art segmentation models.

Cross-City Matters: A Multimodal Remote Sensing Benchmark Dataset for Cross-City Semantic Segmentation using High-Resolution Domain Adaptation Networks

In their paper, the authors focus on a pertinent challenge within the domain of remote sensing: the bottleneck faced by AI models when attempting to generalize from single to multiple urban environments. The research introduces a multi-modal remote sensing dataset, named C2Seg, for cross-city semantic segmentation and proposes a high-resolution domain adaptation network, HighDAN, to enhance the generalization capability of AI models across varied urban locales.

The paper begins by recognizing the limitations of current AI models that perform effectively within isolated urban settings but falter when tasked with cross-city analyses. This inadequacy arises mainly due to the homogeneous nature of data in individual environments, which leads to poor model generalization when encountering the diverse conditions of multiple cities. The proposed solution encompasses creating cross-city benchmark datasets containing multimodal remote sensing data and designing an innovative network that synergizes high-resolution features with domain adaptation strategies.

Dataset Construction

The C2Seg datasets, a salient contribution of this work, incorporate diverse modalities, integrating hyperspectral, multispectral, and synthetic aperture radar (SAR) data. These datasets encompass two cross-city scenes: Berlin-Augsburg in Germany and Beijing-Wuhan in China, acquired from significant satellite platforms such as EnMAP, Sentinel, and Gaofen. The inclusion of multimodal data supports the capturing of complementary information necessary for effective cross-domain semantic segmentation. Notably, C2Seg is, according to the authors, the first benchmark dataset comprising three-modal remote sensing images for cross-city segmentation tasks. This dataset is set to propel further research by providing the scientific community with open access.

High-Resolution Domain Adaptation Network (HighDAN)

The HighDAN architecture innovatively combines the high-resolution network (HR-Net) backbone with adversarial learning strategies to bridge domain gaps and bolster feature learning. It retains spatially relevant high-to-low-resolution features in parallel, enhancing the model's ability to capture detailed urban semantics. HighDAN employs adversarial domain adaptation to align feature representations between source and target domains at both feature and category levels, ensuring effective cross-domain knowledge transfer. Moreover, the introduction of the Dice loss function addresses class imbalance, a common issue in cross-city and regional segmentation tasks.

Results and Implications

The extensive experiments conducted validate HighDAN's superiority in handling cross-city semantic segmentation by yielding higher overall accuracy, mean IoU, and F1 scores across diverse classes compared to existing state-of-the-art models. This indicates the potential of HighDAN to advance urban environmental understanding across different geographic locations effectively.

The implications of this research are twofold. Practically, the availability of the C2Seg datasets and HighDAN might facilitate enhanced urban resource allocation, land use planning, and geo-spatial analytics across cities with varying characteristics. Theoretically, this work bridges existing gaps in multimodal data integration and domain adaptation, which are crucial for advancing generalization capabilities in remote sensing analytics.

Future Developments

Looking ahead, this research could catalyze further exploration into integrating explicit knowledge elements—such as geographical, climatic, and morphological data into domain adaptation networks—to improve interpretability and accuracy. Additionally, expanding the dataset's geographic scope could address larger-scale questions within urban remote sensing and potentially explore more advanced multi-modal fusion techniques.

In conclusion, this paper strengthens the capability of AI-based remote sensing technologies to transition from single-city effectiveness to broader urban applicability, marking a substantial contribution to cross-domain semantic segmentation efforts in remote sensing.

PDF Markdown

Related Papers

GitHub

danfenghong (Danfeng Hong) · GitHub