CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic Segmentation (2410.22629v2)
Abstract: The field of Remote Sensing Domain Generalization (RSDG) has emerged as a critical and valuable research frontier, focusing on developing models that generalize effectively across diverse scenarios. Despite the substantial domain gaps in RS images that are characterized by variabilities such as location, wavelength, and sensor type, research in this area remains underexplored: (1) Current cross-domain methods primarily focus on Domain Adaptation (DA), which adapts models to predefined domains rather than to unseen ones; (2) Few studies targeting the RSDG issue, especially for semantic segmentation tasks, where existing models are developed for specific unknown domains, struggling with issues of underfitting on other unknown scenarios; (3) Existing RS foundation models tend to prioritize in-domain performance over cross-domain generalization. To this end, we introduce the first vision foundation model for RSDG semantic segmentation, CrossEarth. CrossEarth demonstrates strong cross-domain generalization through a specially designed data-level Earth-Style Injection pipeline and a model-level Multi-Task Training pipeline. In addition, for the semantic segmentation task, we have curated an RSDG benchmark comprising 28 cross-domain settings across various regions, spectral bands, platforms, and climates, providing a comprehensive framework for testing the generalizability of future RSDG models. Extensive experiments on this benchmark demonstrate the superiority of CrossEarth over existing state-of-the-art methods.
- M. Reichstein, G. Camps-Valls, B. Stevens, M. Jung, J. Denzler, N. Carvalhais, and f. Prabhat, “Deep learning and process understanding for data-driven earth system science,” Nature, vol. 566, no. 7743, pp. 195–204, 2019.
- M. Weiss, F. Jacob, and G. Duveiller, “Remote sensing for agricultural applications: A meta-review,” Remote Sensing of Environment, vol. 236, p. 111402, 2020.
- Z. Zhu, Y. Zhou, K. C. Seto, E. C. Stokes, C. Deng, S. T. Pickett, and H. Taubenböck, “Understanding an urbanizing planet: Strategic directions for remote sensing,” Remote Sensing of Environment, vol. 228, pp. 164–182, 2019.
- Z. Rui and L. Jintao, “A survey on algorithm research of scene parsing based on deep learning,” Journal of Computer Research and Development, vol. 57, no. 4, pp. 859–875, 2020.
- A. Abdollahi, B. Pradhan, N. Shukla, S. Chakraborty, and A. Alamri, “Multi-object segmentation in complex urban scenes from high-resolution remote sensing data,” Remote Sensing, vol. 13, no. 18, p. 3710, 2021.
- Q. Yuan, H. Shen, T. Li, Z. Li, S. Li, Y. Jiang, H. Xu, W. Tan, Q. Yang, J. Wang, J. Gao, and L. Zhang, “Deep learning in environmental remote sensing: Achievements and challenges,” Remote Sensing of Environment, vol. 241, p. 111716, 2020.
- F. Dell’Acqua and P. Gamba, “Remote sensing and earthquake damage assessment: Experiences, limits, and perspectives,” Proceedings of the IEEE, vol. 100, no. 10, pp. 2876–2890, 2012.
- J. Song, H. Chen, W. Xuan, J. Xia, and N. Yokoya, “Synrs3d: A synthetic dataset for global 3d semantic understanding from monocular remote sensing imagery,” arXiv preprint arXiv:2406.18151, 2024.
- H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in CVPR, 2017, pp. 6230–6239.
- L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in ECCV, 2018, pp. 801–818.
- J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, Z. Fang, and H. Lu, “Dual attention network for scene segmentation,” in CVPR, 2019, pp. 3141–3149.
- X. Li, H. He, X. Li, D. Li, G. Cheng, J. Shi, L. Weng, Y. Tong, and Z. Lin, “Pointflow: Flowing semantics through points for aerial image segmentation,” in CVPR, June 2021, pp. 4217–4226.
- Z. Zheng, Y. Zhong, J. Wang, and A. Ma, “Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery,” in CVPR, 2020, pp. 4095–4104.
- Z. Zheng, Y. Zhong, J. Wang, A. Ma, and L. Zhang, “Farseg++: Foreground-aware relation network for geospatial object segmentation in high spatial resolution remote sensing imagery,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 11, pp. 13 715–13 729, 2023.
- A. Ma, J. Wang, Y. Zhong, and Z. Zheng, “Factseg: Foreground activation-driven small object semantic segmentation in large-scale remote sensing imagery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–16, 2022.
- L. Wang, R. Li, C. Zhang, S. Fang, C. Duan, X. Meng, and P. M. Atkinson, “UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 190, pp. 196–214, 2022.
- D. Hong, B. Zhang, H. Li, Y. Li, J. Yao, C. Li, M. Werner, J. Chanussot, A. Zipf, and X. X. Zhu, “Cross-city matters: A multimodal remote sensing benchmark dataset for cross-city semantic segmentation using high-resolution domain adaptation networks,” Remote Sensing of Environment, vol. 299, p. 113856, 2023.
- Y. Cai, Y. Yang, Y. Shang, Z. Chen, Z. Shen, and J. Yin, “Iterdanet: Iterative intra-domain adaptation for semantic segmentation of remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–17, 2022.
- L. Bai, S. Du, X. Zhang, H. Wang, B. Liu, and S. Ouyang, “Domain adaptation for remote sensing image semantic segmentation: An integrated approach of contrastive learning and adversarial learning,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–13, 2022.
- H. Chen, H. Zhang, G. Yang, S. Li, and L. Zhang, “A mutual information domain adaptation network for remotely sensed semantic segmentation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–16, 2022.
- F. Schenkel and W. Middelmann, “Domain adaptation for semantic segmentation using convolutional neural networks,” in IGARSS, 2019, pp. 728–731.
- S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, “Analysis of representations for domain adaptation,” NeurIPS, vol. 19, 2006.
- Y. Ganin and V. Lempitsky, “Unsupervised domain adaptation by backpropagation,” in ICML, 2015, pp. 1180–1189.
- Y. Zou, Z. Yu, B. Kumar, and J. Wang, “Unsupervised domain adaptation for semantic segmentation via class-balanced self-training,” in ECCV, 2018, pp. 289–305.
- Y. Li, L. Yuan, and N. Vasconcelos, “Bidirectional learning for domain adaptation of semantic segmentation,” in CVPR, 2019, pp. 6936–6945.
- X. Ma, Z. Wang, Y. Zhan, Y. Zheng, Z. Wang, D. Dai, and C.-W. Lin, “Both style and fog matter: Cumulative domain adaptation for semantic foggy scene understanding,” in CVPR, 2022, pp. 18 922–18 931.
- Z. Gong, F. Li, Y. Deng, D. Bhattacharjee, X. Zhu, and Z. Ji, “Coda: Instructive chain-of-domain adaptation with severity-aware visual prompt tuning,” arXiv preprint arXiv:2403.17369, 2024.
- Z. Gong, F. Li, Y. Deng, W. Shen, X. Ma, Z. Ji, and N. Xia, “Train one, generalize to all: Generalizable semantic segmentation from single-scene to all adverse scenes,” in ACM MM, 2023, pp. 2275–2284.
- F. Li, Z. Gong, Y. Deng, X. Ma, R. Zhang, Z. Ji, X. Zhu, and H. Zhang, “Parsing all adverse scenes: Severity-aware semantic segmentation with mask-enhanced cross-domain consistency,” in AAAI, vol. 38, no. 12, 2024, pp. 13 483–13 491.
- A. Xiao, J. Huang, W. Xuan, R. Ren, K. Liu, D. Guan, A. El Saddik, S. Lu, and E. P. Xing, “3d semantic segmentation in the wild: Learning generalized models for adverse-condition point clouds,” in CVPR, 2023, pp. 9382–9392.
- Q. Bi, S. You, and T. Gevers, “Learning content-enhanced mask transformer for domain generalized urban-scene segmentation,” in AAAI, vol. 38, no. 2, 2024, pp. 819–827.
- ——, “Generalized foggy-scene semantic segmentation by frequency decoupling,” in CVPR, 2024, pp. 1389–1399.
- J. Wang, C. Lan, C. Liu, Y. Ouyang, T. Qin, W. Lu, Y. Chen, W. Zeng, and S. Y. Philip, “Generalizing to unseen domains: A survey on domain generalization,” IEEE Transactions on Knowledge and Data Engineering, vol. 35, no. 8, pp. 8052–8072, 2022.
- H. Li, S. J. Pan, S. Wang, and A. C. Kot, “Domain generalization with adversarial feature learning,” in CVPR, 2018, pp. 5400–5409.
- J. Lambert, Z. Liu, O. Sener, J. Hays, and V. Koltun, “Mseg: A composite dataset for multi-domain semantic segmentation,” in CVPR, 2020, pp. 2879–2888.
- K. Zhou, Z. Liu, Y. Qiao, T. Xiang, and C. C. Loy, “Domain generalization: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 4, pp. 4396–4415, 2022.
- C. Liang, W. Li, Y. Dong, and W. Fu, “Single domain generalization method for remote sensing image segmentation via category consistency on domain randomization,” IEEE Transactions on Geoscience and Remote Sensing, 2024.
- M. Luo, S. Ji, and S. Wei, “A diverse large-scale building dataset and a novel plug-and-play domain generalization method for building extraction,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 16, pp. 4122–4138, 2023.
- R. Iizuka, J. Xia, and N. Yokoya, “Frequency-based optimal style mix for domain generalization in semantic segmentation of remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, 2023.
- R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill et al., “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258, 2021.
- ——, “On the opportunities and risks of foundation models,” arXiv preprint arXiv:2108.07258, 2021.
- D. Wang, J. Zhang, B. Du, G.-S. Xia, and D. Tao, “An empirical study of remote sensing pretraining,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–20, 2023.
- X. Sun, P. Wang, W. Lu, Z. Zhu, X. Lu, Q. He, J. Li, X. Rong, Z. Yang, H. Chang, Q. He, G. Yang, R. Wang, J. Lu, and K. Fu, “RingMo: A remote sensing foundation model with masked image modeling,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–22, 2023.
- D. Wang, Q. Zhang, Y. Xu, J. Zhang, B. Du, D. Tao, and L. Zhang, “Advancing plain vision transformer toward remote sensing foundation model,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–15, 2023.
- K. Cha, J. Seo, and T. Lee, “A billion-scale foundation model for remote sensing images,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, pp. 1–17, 2024.
- K. Chen, C. Liu, H. Chen, H. Zhang, W. Li, Z. Zou, and Z. Shi, “Rsprompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model,” IEEE Transactions on Geoscience and Remote Sensing, 2024.
- X. Guo, J. Lao, B. Dang, Y. Zhang, L. Yu, L. Ru, L. Zhong, Z. Huang, K. Wu, D. Hu, H. He, J. Wang, J. Chen, M. Yang, Y. Zhang, and Y. Li, “Skysense: A multi-modal remote sensing foundation model towards universal interpretation for earth observation imagery,” in CVPR, 2024, pp. 27 672–27 683.
- D. Hong, B. Zhang, X. Li, Y. Li, C. Li, J. Yao, N. Yokoya, H. Li, P. Ghamisi, X. Jia, A. Plaza, P. Gamba, J. A. Benediktsson, and J. Chanussot, “SpectralGPT: Spectral remote sensing foundation model,” IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–18, 2024.
- W. Yang, Y. Hou, L. Liu, Y. Liu, X. Li et al., “SARATR-X: A foundation model for synthetic aperture radar images target recognition,” arXiv e-prints, pp. arXiv–2405, 2024.
- D. Wang, M. Hu, Y. Jin, Y. Miao, J. Yang, Y. Xu, X. Qin, J. Ma, L. Sun, C. Li, C. Fu, H. Chen, C. Han, N. Yokoya, J. Zhang, M. Xu, L. Liu, L. Zhang, C. Wu, B. Du, D. Tao, and L. Zhang, “Hypersigma: Hyperspectral intelligence comprehension foundation model,” arXiv preprint arXiv:2406.11519, 2024.
- D. Wang, J. Zhang, M. Xu, L. Liu, D. Wang, E. Gao, C. Han, H. Guo, B. Du, D. Tao et al., “Mtp: Advancing remote sensing foundation model via multi-task pretraining,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024.
- K. Li, X. Cao, and D. Meng, “A new learning paradigm for foundation model-based remote-sensing change detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–12, 2024.
- M. Mendieta, B. Han, X. Shi, Y. Zhu, and C. Chen, “Towards geospatial foundation models via continual pretraining,” in ICCV, 2023, pp. 16 806–16 816.
- Z. Dong, Y. Gu, and T. Liu, “Upetu: A unified parameter-efficient fine-tuning framework for remote sensing foundation model,” IEEE Transactions on Geoscience and Remote Sensing, 2024.
- D. Wang, J. Zhang, B. Du, M. Xu, L. Liu, D. Tao, and L. Zhang, “SAMRS: Scaling-up remote sensing segmentation dataset with segment anything model,” in NeurIPS, vol. 36, 2023, pp. 8815–8827.
- C. J. Reed, X. Yue, A. Nrusimha, S. Ebrahimi, V. Vijaykumar, R. Mao, B. Li, S. Zhang, D. Guillory, S. Metzger et al., “Self-supervised pretraining improves self-supervised pretraining,” in WACV, 2022, pp. 2584–2594.
- Y. Cong, S. Khanna, C. Meng, P. Liu, E. Rozi, Y. He, M. Burke, D. Lobell, and S. Ermon, “Satmae: Pre-training transformers for temporal and multi-spectral satellite imagery,” NeurIPS, vol. 35, pp. 197–211, 2022.
- J. Wang, C. Lan, C. Liu, Y. Ouyang, T. Qin, W. Lu, Y. Chen, W. Zeng, and S. Y. Philip, “Generalizing to unseen domains: A survey on domain generalization,” IEEE Transactions on Knowledge and data engineering, vol. 35, no. 8, pp. 8052–8072, 2022.
- M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby et al., “Dinov2: Learning robust visual features without supervision,” arXiv preprint arXiv:2304.07193, 2023.
- J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in CVPR, 2015, pp. 3431–3440.
- Y.-H. Tsai, W.-C. Hung, S. Schulter, K. Sohn, M.-H. Yang, and M. Chandraker, “Learning to adapt structured output space for semantic segmentation,” in CVPR, 2018, pp. 7472–7481.
- D. Nilsson, A. Pirinen, E. Gärtner, and C. Sminchisescu, “Embodied visual active learning for semantic segmentation,” in AAAI, vol. 35, no. 3, 2021, pp. 2373–2383.
- S. Ainetter and F. Fraundorfer, “End-to-end trainable deep neural network for robotic grasp detection and semantic segmentation from rgb,” in ICRA, 2021, pp. 13 452–13 458.
- A. Hatamizadeh, V. Nath, Y. Tang, D. Yang, H. R. Roth, and D. Xu, “Swin unetr: Swin transformers for semantic segmentation of brain tumors in mri images,” in MICCAI Workshop, 2021, pp. 272–284.
- F. Shamshad, S. Khan, S. W. Zamir, M. H. Khan, M. Hayat, F. S. Khan, and H. Fu, “Transformers in medical imaging: A survey,” Medical Image Analysis, vol. 88, p. 102802, 2023.
- A. Hatamizadeh, Y. Tang, V. Nath, D. Yang, A. Myronenko, B. Landman, H. R. Roth, and D. Xu, “Unetr: Transformers for 3d medical image segmentation,” in WACV, 2022, pp. 574–584.
- M. A. Mazurowski, H. Dong, H. Gu, J. Yang, N. Konz, and Y. Zhang, “Segment anything model for medical image analysis: an experimental study,” Medical Image Analysis, vol. 89, p. 102918, 2023.
- Q. Bi, J. Yi, H. Zheng, W. Ji, Y. Huang, Y. Li, and Y. Zheng, “Learning generalized medical image segmentation from decoupled feature queries,” in AAAI, vol. 38, no. 2, 2024, pp. 810–818.
- A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y. Lo et al., “Segment anything,” in ICCV, 2023, pp. 4015–4026.
- A. Xiao, W. Xuan, H. Qi, Y. Xing, R. Ren, X. Zhang, and S. Lu, “Cat-sam: Conditional tuning network for few-shot adaptation of segmentation anything model,” arXiv preprint arXiv:2402.03631, 2024.
- L. Ke, M. Ye, M. Danelljan, Y.-W. Tai, C.-K. Tang, F. Yu et al., “Segment anything in high quality,” NeurIPS, vol. 36, 2024.
- J. Lv, Q. Shen, M. Lv, Y. Li, L. Shi, and P. Zhang, “Deep learning-based semantic segmentation of remote sensing images: a review,” Frontiers in Ecology and Evolution, vol. 11, p. 1201125, 2023.
- X.-Y. Tong, G.-S. Xia, Q. Lu, H. Shen, S. Li, S. You, and L. Zhang, “Land-cover classification with high-resolution remote sensing images using transferable deep models,” Remote Sensing of Environment, vol. 237, p. 111322, 2020.
- S. Subudhi, R. N. Patro, P. K. Biswal, and F. Dell’Acqua, “A survey on superpixel segmentation as a preprocessing step in hyperspectral image analysis,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 5015–5035, 2021.
- X. Zhang, P. Xiao, and X. Feng, “Object-specific optimization of hierarchical multiscale segmentations for high-spatial resolution remote sensing images,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 159, pp. 308–321, 2020.
- X. Zheng, L. Huan, G.-S. Xia, and J. Gong, “Parsing very high resolution urban scene images by learning deep convnets with edge-aware loss,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 170, pp. 15–28, 2020.
- H.-F. Zhong, Q. Sun, H.-M. Sun, and R.-S. Jia, “Nt-net: A semantic segmentation network for extracting lake water bodies from optical remote sensing images based on transformer,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–13, 2022.
- X. Li, F. Xu, F. Liu, X. Lyu, Y. Tong, Z. Xu, and J. Zhou, “A synergistical attention model for semantic segmentation of remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–16, 2023.
- H. Xu, X. Tang, B. Ai, F. Yang, Z. Wen, and X. Yang, “Feature-selection high-resolution network with hypersphere embedding for semantic segmentation of vhr remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–15, 2022.
- J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang, W. Liu, and B. Xiao, “Deep high-resolution representation learning for visual recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3349–3364, 2021.
- O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in MICCAI, 2015, pp. 234–241.
- A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.
- H. Nam, H. Lee, J. Park, W. Yoon, and D. Yoo, “Reducing domain gap by reducing style bias,” in CVPR, 2021, pp. 8690–8699.
- E. Romera, L. M. Bergasa, K. Yang, J. M. Alvarez, and R. Barea, “Bridging the day and night domain gap for semantic segmentation,” in IV, 2019, pp. 1312–1318.
- K. Regmi and M. Shah, “Bridging the domain gap for ground-to-aerial image matching,” in ICCV, 2019, pp. 470–479.
- Z. Tang, B. Pan, E. Liu, X. Xu, T. Shi, and Z. Shi, “Srda-net: super-resolution domain adaptation networks for semantic segmentation,” arXiv preprint arXiv:2005.06382, 2020.
- N. Bengana and J. Heikkilä, “Improving land cover segmentation across satellites using domain adaptation,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 1399–1410, 2020.
- B. Zhang, T. Chen, and B. Wang, “Curriculum-style local-to-global adaptation for cross-domain remote sensing image segmentation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–12, 2021.
- O. Tasar, S. Happy, Y. Tarabalka, and P. Alliez, “Colormapgan: Unsupervised domain adaptation for semantic segmentation using color mapping generative adversarial networks,” IEEE Transactions on Geoscience and Remote Sensing, vol. 58, no. 10, pp. 7178–7193, 2020.
- C. Ayala, R. Sesma, C. Aranda, and M. Galar, “Diffusion models for remote sensing imagery semantic segmentation,” in IGARSS, 2023, pp. 5654–5657.
- C. Zhao, Y. Ogawa, S. Chen, Z. Yang, and Y. Sekimoto, “Label freedom: Stable diffusion for remote sensing image semantic segmentation data generation,” in IEEE BigData, 2023, pp. 1022–1030.
- X. Ma, X. Zhang, Z. Wang, and M.-O. Pun, “Unsupervised domain adaptation augmented by mutually boosted attention for semantic segmentation of vhr remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–15, 2023.
- L. Shi, Z. Wang, B. Pan, and Z. Shi, “An end-to-end network for remote sensing imagery semantic segmentation via joint pixel-and representation-level domain adaptation,” IEEE Geoscience and Remote Sensing Letters, vol. 18, no. 11, pp. 1896–1900, 2020.
- W. Liu and F. Su, “Unsupervised adversarial domain adaptation network for semantic segmentation,” IEEE Geoscience and Remote Sensing Letters, vol. 17, no. 11, pp. 1978–1982, 2019.
- O. Tasar, Y. Tarabalka, A. Giros, P. Alliez, and S. Clerc, “Standardgan: Multi-source domain adaptation for semantic segmentation of very high resolution satellite images by data standardization,” in CVPR Workshops, 2020, pp. 192–193.
- B. Benjdira, A. Ammar, A. Koubaa, and K. Ouni, “Data-efficient domain adaptation for semantic segmentation of aerial imagery using generative adversarial networks,” Applied Sciences, vol. 10, no. 3, p. 1092, 2020.
- Y. Li, T. Shi, Y. Zhang, and J. Ma, “Spgan-da: Semantic-preserved generative adversarial network for domain adaptive remote sensing image semantic segmentation,” IEEE Transactions on Geoscience and Remote Sensing, 2023.
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial networks,” Communications of the ACM, vol. 63, no. 11, pp. 139–144, 2020.
- J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” NeurIPS, vol. 33, pp. 6840–6851, 2020.
- R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-resolution image synthesis with latent diffusion models,” in CVPR, 2022, pp. 10 684–10 695.
- Z. Xi, X. He, Y. Meng, A. Yue, J. Chen, Y. Deng, and J. Chen, “A multilevel-guided curriculum domain adaptation approach to semantic segmentation for high-resolution remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, 2023.
- L. Zhang, M. Lan, J. Zhang, and D. Tao, “Stagewise unsupervised domain adaptation with adversarial self-training for road segmentation of remote-sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–13, 2021.
- W. Liu, Z. Luo, Y. Cai, Y. Yu, Y. Ke, J. M. Junior, W. N. Gonçalves, and J. Li, “Adversarial unsupervised domain adaptation for 3d semantic segmentation with multi-modal learning,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 176, pp. 211–221, 2021.
- X. Deng, H. L. Yang, N. Makkar, and D. Lunga, “Large scale unsupervised domain adaptation of segmentation networks with adversarial learning,” in IGARSS, 2019, pp. 4955–4958.
- J. Zhu, Y. Guo, G. Sun, L. Yang, M. Deng, and J. Chen, “Unsupervised domain adaptation semantic segmentation of high-resolution remote sensing imagery with invariant domain-level prototype memory,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–18, 2023.
- J. Chen, P. He, J. Zhu, Y. Guo, G. Sun, M. Deng, and H. Li, “Memory-contrastive unsupervised domain adaptation for building extraction of high-resolution remote sensing imagery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–15, 2023.
- S. F. Ismael, K. Kayabol, and E. Aptoula, “Unsupervised domain adaptation for the semantic segmentation of remote sensing images via one-shot image-to-image translation,” IEEE Geoscience and Remote Sensing Letters, vol. 20, pp. 1–5, 2023.
- O. Tasar, A. Giros, Y. Tarabalka, P. Alliez, and S. Clerc, “Daugnet: Unsupervised, multisource, multitarget, and life-long domain adaptation for semantic segmentation of satellite images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 2, pp. 1067–1081, 2020.
- K. Gao, A. Yu, X. You, C. Qiu, and B. Liu, “Prototype and context-enhanced learning for unsupervised domain adaptation semantic segmentation of remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–16, 2023.
- O. Tasar, S. Happy, Y. Tarabalka, and P. Alliez, “Semi2i: Semantically consistent image-to-image translation for domain adaptation of remote sensing data,” in IGARSS, 2020, pp. 1837–1840.
- T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” NeurIPS, vol. 33, pp. 1877–1901, 2020.
- A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann et al., “Palm: Scaling language modeling with pathways,” Journal of Machine Learning Research, vol. 24, no. 240, pp. 1–113, 2023.
- H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar et al., “Llama: Open and efficient foundation language models,” arXiv preprint arXiv:2302.13971, 2023.
- X. Wang, W. Wang, Y. Cao, C. Shen, and T. Huang, “Images speak in images: A generalist painter for in-context visual learning,” in CVPR, 2023, pp. 6830–6839.
- A. Bar, Y. Gandelsman, T. Darrell, A. Globerson, and A. Efros, “Visual prompting via image inpainting,” NeurIPS, vol. 35, pp. 25 005–25 017, 2022.
- X. Wang, X. Zhang, Y. Cao, W. Wang, C. Shen, and T. Huang, “Seggpt: Segmenting everything in context,” arXiv preprint arXiv:2304.03284, 2023.
- Y. Bai, X. Geng, K. Mangalam, A. Bar, A. L. Yuille, T. Darrell, J. Malik, and A. A. Efros, “Sequential modeling enables scalable learning for large vision models,” in CVPR, 2024, pp. 22 861–22 872.
- W. Wang, J. Dai, Z. Chen, Z. Huang, Z. Li, X. Zhu, X. Hu, T. Lu, L. Lu, H. Li et al., “Internimage: Exploring large-scale vision foundation models with deformable convolutions,” in CVPR, 2023, pp. 14 408–14 419.
- K. He, X. Chen, S. Xie, Y. Li, P. Dollár, and R. Girshick, “Masked autoencoders are scalable vision learners,” in CVPR, 2022, pp. 16 000–16 009.
- H. Liu, C. Li, Q. Wu, and Y. J. Lee, “Visual instruction tuning,” NeurIPS, vol. 36, 2024.
- W. Wang, Z. Chen, X. Chen, J. Wu, X. Zhu, G. Zeng, P. Luo, T. Lu, J. Zhou, Y. Qiao et al., “Visionllm: Large language model is also an open-ended decoder for vision-centric tasks,” NeurIPS, vol. 36, 2024.
- Z. Chen, J. Wu, W. Wang, W. Su, G. Chen, S. Xing, M. Zhong, Q. Zhang, X. Zhu, L. Lu et al., “Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks,” in CVPR, 2024, pp. 24 185–24 198.
- J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat et al., “Gpt-4 technical report,” arXiv preprint arXiv:2303.08774, 2023.
- F. Liu, D. Chen, Z. Guan, X. Zhou, J. Zhu, Q. Ye, L. Fu, and J. Zhou, “Remoteclip: A vision language foundation model for remote sensing,” IEEE Transactions on Geoscience and Remote Sensing, 2024.
- K. Kuckreja, M. S. Danish, M. Naseer, A. Das, S. Khan, and F. S. Khan, “Geochat: Grounded large vision-language model for remote sensing,” in CVPR, 2024, pp. 27 831–27 840.
- J. Zhang, Z. Zhou, G. Mai, L. Mu, M. Hu, and S. Li, “Text2seg: Remote sensing image semantic segmentation via text-guided visual foundation models,” arXiv preprint arXiv:2304.10597, 2023.
- W. Zhang, M. Cai, T. Zhang, Y. Zhuang, and X. Mao, “Earthgpt: A universal multi-modal large language model for multi-sensor image comprehension in remote sensing domain,” IEEE Transactions on Geoscience and Remote Sensing, 2024.
- Y. Zhan, Z. Xiong, and Y. Yuan, “Skyeyegpt: Unifying remote sensing vision-language tasks via instruction tuning with large language model,” arXiv preprint arXiv:2401.09712, 2024.
- Y. Hu, J. Yuan, C. Wen, X. Lu, and X. Li, “Rsgpt: A remote sensing vision language model and benchmark,” arXiv preprint arXiv:2307.15266, 2023.
- U. Mall, C. P. Phoo, M. K. Liu, C. Vondrick, B. Hariharan, and K. Bala, “Remote sensing vision-language foundation models without annotations via ground remote alignment,” arXiv preprint arXiv:2312.06960, 2023.
- X. Li, C. Wen, Y. Hu, and N. Zhou, “Rs-clip: Zero shot remote sensing scene classification via contrastive vision-language supervision,” International Journal of Applied Earth Observation and Geoinformation, vol. 124, p. 103497, 2023.
- C. Pang, J. Wu, J. Li, Y. Liu, J. Sun, W. Li, X. Weng, S. Wang, L. Feng, G.-S. Xia et al., “H2rsvlm: Towards helpful and honest remote sensing large vision language model,” arXiv preprint arXiv:2403.20213, 2024.
- C. Chappuis, V. Zermatten, S. Lobry, B. Le Saux, and D. Tuia, “Prompt-rsvqa: Prompting visual context to a language model for remote sensing visual question answering,” in CVPR, 2022, pp. 1372–1381.
- Z. Yu, C. Liu, L. Liu, Z. Shi, and Z. Zou, “Metaearth: A generative foundation model for global-scale remote sensing image generation,” arXiv preprint arXiv:2405.13570, 2024.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in ICML, 2021, pp. 8748–8763.
- L. Scheibenreif, M. Mommert, and D. Borth, “Parameter efficient self-supervised geospatial domain adaptation,” in CVPR, 2024, pp. 27 841–27 851.
- P. W. Koh, S. Sagawa, H. Marklund, S. M. Xie, M. Zhang, A. Balsubramani, W. Hu, M. Yasunaga, R. L. Phillips, I. Gao et al., “Wilds: A benchmark of in-the-wild distribution shifts,” in ICML, 2021, pp. 5637–5664.
- H. Yao, X. Yang, X. Pan, S. Liu, P. W. Koh, and C. Finn, “Improving domain generalization with domain relations,” in ICLR.
- Y. Zhao, Z. Zhong, N. Zhao, N. Sebe, and G. H. Lee, “Style-hallucinated dual consistency learning: A unified framework for visual domain generalization,” International Journal of Computer Vision, vol. 132, no. 3, pp. 837–853, 2024.
- Y. Long, G.-S. Xia, S. Li, W. Yang, M. Y. Yang, X. X. Zhu, L. Zhang, and D. Li, “On creating benchmark dataset for aerial image interpretation: Reviews, guidances and million-aid,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 4205–4230, 2021.
- L. Hoyer, D. Dai, H. Wang, and L. Van Gool, “Mic: Masked image consistency for context-enhanced domain adaptation,” in CVPR, 2023, pp. 11 721–11 732.
- P. T. Jackson, A. A. Abarghouei, S. Bonner, T. P. Breckon, and B. Obara, “Style augmentation: data augmentation via style randomization.” in CVPR workshops, vol. 6, 2019, pp. 10–11.
- J. Wang, Z. Zheng, A. Ma, X. Lu, and Y. Zhong, “Loveda: A remote sensing land-cover dataset for domain adaptive semantic segmentation,” arXiv preprint arXiv:2110.08733, 2021.
- W. Chen, Z. Jiang, Z. Wang, K. Cui, and X. Qian, “Collaborative global-local networks for memory-efficient segmentation of ultra-high resolution images,” in CVPR, 2019.
- V. Mnih, “Machine learning for aerial image labeling,” Ph.D. dissertation, University of Toronto, 2013.
- M. Rahnemoonfar, T. Chowdhury, and R. Murphy, “Rescuenet: A high resolution uav semantic segmentation benchmark dataset for natural disaster damage assessment,” arXiv preprint arXiv:2202.12361, 2022.
- S. Ji, S. Wei, and M. Lu, “Fully convolutional networks for multisource building extraction from an open aerial and satellite imagery data set,” IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 1, pp. 574–586, 2018.
- S. Liu, L. Chen, L. Zhang, J. Hu, and Y. Fu, “A large-scale climate-aware satellite image dataset for domain adaptive land-cover semantic segmentation,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 205, pp. 98–114, 2023.
- B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked-attention mask transformer for universal image segmentation,” in CVPR, 2022, pp. 1290–1299.
- Z. Wei, L. Chen, Y. Jin, X. Ma, T. Liu, P. Ling, B. Wang, H. Chen, and J. Zheng, “Stronger fewer & superior: Harnessing vision foundation models for domain generalized semantic segmentation,” in CVPR, June 2024, pp. 28 619–28 630.
- Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural computation, vol. 1, no. 4, pp. 541–551, 1989.
- X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in AISTATS, 2011, pp. 315–323.
- Z. Huang, X. Wang, L. Huang, C. Huang, Y. Wei, and W. Liu, “Ccnet: Criss-cross attention for semantic segmentation,” in ICCV, 2019, pp. 603–612.
- F. Milletari, N. Navab, and S.-A. Ahmadi, “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in 3DV, 2016, pp. 565–571.
- L. Hoyer, D. Dai, and L. Van Gool, “Daformer: Improving network architectures and training strategies for domain-adaptive semantic segmentation,” in CVPR, 2022, pp. 9924–9935.
- E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” NeurIPS, vol. 34, pp. 12 077–12 090, 2021.
- L. Hoyer, D. Dai, and L. Van Gool, “Hrda: Context-aware high-resolution domain-adaptive semantic segmentation,” in ECCV, 2022, pp. 372–391.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in CVPR, 2016, pp. 770–778.
- P. Zhang, B. Zhang, T. Zhang, D. Chen, Y. Wang, and F. Wen, “Prototypical pseudo label denoising and target structure learning for domain adaptive semantic segmentation,” in CVPR, 2021, pp. 12 414–12 424.
- H. Wang, T. Shen, W. Zhang, L.-Y. Duan, and T. Mei, “Classes matter: A fine-grained adversarial approach to cross-domain semantic segmentation,” in ECCV, 2020, pp. 642–659.
- X. Chen, S. Pan, and Y. Chong, “Unsupervised domain adaptation for remote sensing image semantic segmentation using region and category adaptive domain discriminator,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–13, 2022.
- H. Ni, Q. Liu, H. Guan, H. Tang, and J. Chanussot, “Category-level assignment for cross-domain semantic segmentation in remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, 2023.
- T.-H. Vu, H. Jain, M. Bucher, M. Cord, and P. Pérez, “Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation,” in CVPR, 2019, pp. 2517–2526.
- F. Zhang, Y. Shi, Z. Xiong, W. Huang, and X. X. Zhu, “Pseudo features guided self-training for domain adaptive semantic segmentation of satellite images,” IEEE Transactions on Geoscience and Remote Sensing, 2023.
- C. Liang, B. Cheng, B. Xiao, Y. Dong, and J. Chen, “Multilevel heterogeneous domain adaptation method for remote sensing image segmentation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–16, 2023.
- C. Liang, B. Cheng, B. Xiao, and Y. Dong, “Unsupervised domain adaptation for remote sensing image segmentation based on adversarial learning and self-training,” IEEE Geoscience and Remote Sensing Letters, 2023.
- I. Demir, K. Koperski, D. Lindenbaum, G. Pang, J. Huang, S. Basu, F. Hughes, D. Tuia, and R. Raskar, “Deepglobe 2018: A challenge to parse the earth through satellite images,” in CVPR workshops, 2018, pp. 172–181.
- L. Hoyer, D. Dai, and L. Van Gool, “Domain adaptive and generalizable network architectures and training strategies for semantic image segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
- I. Loshchilov, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
- L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv preprint arXiv:1706.05587, 2017.
- K. Saito, K. Watanabe, Y. Ushiku, and T. Harada, “Maximum classifier discrepancy for unsupervised domain adaptation,” in CVPR, 2018, pp. 3723–3732.
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
- Z. Wang, M. Yu, Y. Wei, R. Feris, J. Xiong, W.-m. Hwu, T. S. Huang, and H. Shi, “Differential treatment for stuff and things: A simple unsupervised domain adaptation method for semantic segmentation,” in CVPR, 2020, pp. 12 635–12 644.
- J. Chen, J. Zhu, Y. Guo, G. Sun, Y. Zhang, and M. Deng, “Unsupervised domain adaptation for semantic segmentation of high-resolution remote sensing imagery driven by category-certainty attention,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–15, 2022.
- J. Chen, G. Chen, B. Fang, J. Wang, and L. Wang, “Class-aware domain adaptation for coastal land cover mapping using optical remote sensing imagery,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 11 800–11 813, 2021.
- L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 4, pp. 834–848, 2017.
- L. McInnes, J. Healy, and J. Melville, “UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction,” ArXiv e-prints, 2018.
- L. Wang, P. Xiao, X. Zhang, and X. Chen, “A fine-grained unsupervised domain adaptation framework for semantic segmentation of remote sensing images,” IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2023.
- Y. Li, T. Shi, Y. Zhang, W. Chen, Z. Wang, and H. Li, “Learning deep semantic segmentation network under multiple weakly-supervised constraints for cross-domain remote sensing image semantic segmentation,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 175, pp. 20–33, 2021.
- Y. Zhao, P. Guo, Z. Sun, X. Chen, and H. Gao, “Residualgan: Resize-residual dualgan for cross-domain remote sensing images semantic segmentation,” Remote Sensing, vol. 15, no. 5, p. 1428, 2023.
- L. Wu, M. Lu, and L. Fang, “Deep covariance alignment for domain adaptive remote sensing image segmentation,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–11, 2022.
- W. Li, H. Gao, Y. Su, and B. M. Momanyi, “Unsupervised domain adaptation for remote sensing semantic segmentation with transformer,” Remote Sensing, vol. 14, no. 19, p. 4942, 2022.
- M. Contributors, “MMSegmentation: Openmmlab semantic segmentation toolbox and benchmark,” https://github.com/open-mmlab/mmsegmentation, 2020.
- L. Zhou, C. Zhang, and M. Wu, “D-linknet: Linknet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction,” in CVPR workshops, 2018, pp. 182–186.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.