Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 87 tok/s

Gemini 2.5 Pro 56 tok/s Pro

GPT-5 Medium 16 tok/s Pro

GPT-5 High 18 tok/s Pro

GPT-4o 98 tok/s Pro

Kimi K2 210 tok/s Pro

GPT OSS 120B 451 tok/s Pro

Claude Sonnet 4 39 tok/s Pro

2000 character limit reached

TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation (2506.06281v1)

Published 6 Jun 2025 in cs.CV

Abstract: Modern Earth observation (EO) increasingly leverages deep learning to harness the scale and diversity of satellite imagery across sensors and regions. While recent foundation models have demonstrated promising generalization across EO tasks, many remain limited by the scale, geographical coverage, and spectral diversity of their training data, factors critical for learning globally transferable representations. In this work, we introduce TerraFM, a scalable self-supervised learning model that leverages globally distributed Sentinel-1 and Sentinel-2 imagery, combined with large spatial tiles and land-cover aware sampling to enrich spatial and semantic coverage. By treating sensing modalities as natural augmentations in our self-supervised approach, we unify radar and optical inputs via modality-specific patch embeddings and adaptive cross-attention fusion. Our training strategy integrates local-global contrastive learning and introduces a dual-centering mechanism that incorporates class-frequency-aware regularization to address long-tailed distributions in land cover.TerraFM achieves strong generalization on both classification and segmentation tasks, outperforming prior models on GEO-Bench and Copernicus-Bench. Our code and pretrained models are publicly available at: https://github.com/mbzuai-oryx/TerraFM .

Summary

The paper introduces TerraFM, a novel model that unifies multisensor data from Sentinel-1 and Sentinel-2 for robust Earth observation.
It employs self-supervised learning with modality-specific patch embeddings and adaptive cross-attention to fuse radar and optical inputs effectively.
TerraFM outperforms existing models on benchmarks like GEO-Bench and Copernicus-Bench, demonstrating improved generalization across diverse geographies.

An Overview of TerraFM: A Scalable Foundation Model for Multisensor Earth Observation

The paper introduces TerraFM, a novel foundation model designed to enhance Earth observation (EO) through a scalable, unified, multisensor approach. TerraFM leverages the vast and varied datasets available from the Sentinel-1 and Sentinel-2 satellite missions to address the fundamental challenges in remote sensing, particularly the scarcity of models capable of robustly generalizing across diverse geographical and spectral contexts.

Core Contributions and Methodology

TerraFM seeks to address the limitations of existing EO models by integrating a self-supervised learning framework with several advanced components:

Data and Modality Integration: The use of globally distributed Sentinel-1 (SAR) and Sentinel-2 (optical) datasets enhances the model's spatial and semantic representation capabilities. TerraFM employs large spatial tiles and integrates land-cover-aware sampling to enrich spatial coverage, vastly improving the generalization of foundational geospatial representations.
Modality-Specific Embedding and Fusion: An innovative feature of TerraFM is its modality-specific patch embeddings, which allow for the transformation of diverse sensor inputs into a coherent format. This is achieved using adaptive cross-attention that fuses radar and optical inputs, thus unifying multimodal EO data effectively.
Dual-Centering Mechanism: To address the prevalent issue of long-tailed distribution in land cover data, TerraFM introduces a dual-centering contrastive learning technique that incorporates class-frequency-aware regularization. This approach is critical for mitigating the representation bias associated with more frequent land cover types in the dataset.
Alignment through Self-Supervision: By considering sensing modalities as natural augmentations, TerraFM enhances the model’s ability to merge radar and optical data, facilitating improved alignment and representation learning through contrastive and self-supervised methodologies.

Evaluation and Performance

The model's efficacy is demonstrated across standard benchmarks, including GEO-Bench and Copernicus-Bench, showcasing superior performance over contemporary EO models. In classification and segmentation tasks, TerraFM consistently outperforms previous models, exhibiting strong generalization across different modalities and geographies.

TerraFM's versatility is noteworthy in its ability to accommodate various sensor inputs and resolutions—a crucial advancement in the context of EO, where datasets exhibit significant heterogeneity in terms of spatial resolution and spectral content. The model achieves this impressive scalability without succumbing to overfitting, thanks to its design that seamlessly balances the need for local detail and broader semantic context through its large-tile methodology.

Implications and Future Directions

The implications of TerraFM are significant for the field of remote sensing and Earth observation. By providing a robust framework for integrating diverse sensor inputs, TerraFM advances the development of more accurate and generalizable models for environmental monitoring, disaster response, and land cover analysis.

Looking ahead, TerraFM may catalyze further advancements in the development of scalable AI models for EO applications. It provides a methodological foundation for future research focused on integrating even more complex data sources, such as hyperspectral or LiDAR data, into foundational Earth monitoring models. The work also invites exploration into real-time EO applications, potentially setting the stage for advancements in climate monitoring and urban development planning.

In conclusion, TerraFM represents a noteworthy step forward in creating more robust and adaptable models for EO, with potential applications spanning numerous critical areas. By integrating self-supervised learning with sophisticated data fusion techniques, TerraFM lays the groundwork for future innovations in satellite data analysis and interpretation.