Papers
Topics
Authors
Recent
AI Research Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 89 tok/s
Gemini 2.5 Pro 43 tok/s Pro
GPT-5 Medium 24 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 112 tok/s Pro
Kimi K2 199 tok/s Pro
GPT OSS 120B 449 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Distribution Shifts at Scale: Out-of-distribution Detection in Earth Observation (2412.13394v2)

Published 18 Dec 2024 in cs.CV, cs.AI, and cs.LG

Abstract: Training robust deep learning models is crucial in Earth Observation, where globally deployed models often face distribution shifts that degrade performance, especially in low-data regions. Out-of-distribution (OOD) detection addresses this by identifying inputs that deviate from in-distribution (ID) data. However, existing methods either assume access to OOD data or compromise primary task performance, limiting real-world use. We introduce TARDIS, a post-hoc OOD detection method designed for scalable geospatial deployment. Our core innovation lies in generating surrogate distribution labels by leveraging ID data within the feature space. TARDIS takes a pre-trained model, ID data, and data from an unknown distribution (WILD), separates WILD into surrogate ID and OOD labels based on internal activations, and trains a binary classifier to detect distribution shifts. We validate on EuroSAT and xBD across 17 setups covering covariate and semantic shifts, showing near-upper-bound surrogate labeling performance in 13 cases and matching the performance of top post-hoc activation- and scoring-based methods. Finally, deploying TARDIS on Fields of the World reveals actionable insights into pre-trained model behavior at scale. The code is available at \href{https://github.com/microsoft/geospatial-ood-detection}{https://github.com/microsoft/geospatial-ood-detection}

Summary

  • The paper introduces TARDIS, a post-hoc method for large-scale out-of-distribution detection in Earth Observation using surrogate labels without model retraining.
  • TARDIS leverages internal model activations to assign surrogate in-distribution and out-of-distribution labels, enabling effective binary OOD detection with only known ID samples during training.
  • Empirical validation shows TARDIS performs near theoretical upper bounds on EuroSAT and xBD datasets and scales effectively on large Sentinel-2 imagery, enhancing deployment reliability.

Insights into Distribution Shifts at Scale in Earth Observation

The paper "Distribution Shifts at Scale: Out-of-distribution Detection in Earth Observation" addresses a fundamental concern in the deployment of deep learning models for Earth Observation (EO): the capability to detect out-of-distribution (OOD) scenarios. Such scenarios critically impair model performance, notably in geospatial deployments where distribution shifts are pervasive due to factors like temporal, spatial, and environmental variability.

Core Contributions

This paper introduces TARDIS (Test-time Addressing of Distribution Shifts at Scale), a post-hoc method designed to detect OOD samples effectively on a large scale without compromising the model's performance on the primary task. TARDIS uniquely handles unknown test-time distributions using surrogate labels generated from in-distribution (ID) data and unidentified distributions. This method encompasses leveraging internal activations to assign surrogate ID and OOD labels, thus facilitating a binary classifier's training for OOD detection without needing explicit OOD examples during training.

Methodological Innovation

TARDIS distinguishes itself by seamlessly integrating into existing workflows. It does not require retraining models or modifying existing neural network architectures. The novel concept of surrogate label generation allows TARDIS to work with a pre-trained model and a dataset of known ID samples while handling unknown distribution samples, termed WILD samples. By clustering activation features and applying a defined threshold, it assigns surrogate labels that remarkably approach the theoretical upper bounds of OOD detection performance across varied scenarios.

Empirical Validation

The authors validate TARDIS across the EuroSAT and xBD datasets, involving scenarios of both covariate and semantic shifts. The efficacy of TARDIS is underscored by its performance nearing the upper bounds theoretically possible for cases with known OOD distributions. For instance, the results indicate that in 13 out of 17 experimental setups, TARDIS achieves performance nearly indistinguishable from an oracle classifier. Such robust performance across diverse settings illustrates its adaptability and potential as a practical solution for global deployments.

Implications and Future Directions

The implementation of TARDIS at scale, as demonstrated with the Fields of the World dataset using Sentinel-2 imagery, showcases its scalability and utility in real-world applications where models face varying distribution shifts at deployment. This scaling ability implies that organizations can deploy models that alert practitioners to potential distributional violations, thereby guiding resource allocation for further data collection and model retraining.

The implications of this research extend to enhancing model reliability and trustworthiness, improving transparency in geospatial analytics, and reducing the risk of catastrophic decision-making errors due to model overconfidence. The integration of TARDIS into EO pipelines supports a human-in-the-loop approach, reinforcing trust, transparency, and accountability.

Moving forward, the research will likely build on TARDIS's framework, examining more granularly the activation patterns that lead to OOD detection and exploring its integration with other real-time adaptive learning strategies. Additionally, future work could investigate the trade-offs between surrogate label granularity and computational efficiency to further optimize the balance between OOD detection performance and deployment costs.

In conclusion, the introduction and validation of TARDIS mark a significant step toward practical and scalable OOD detection in the domain of Earth Observation, with vast potential applications in other fields requiring robust diagnostic tools for ML model deployments.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 posts and received 9 likes.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube