Papers

Topics

Authors

Recent

View all

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 89 tok/s

Gemini 2.5 Pro 43 tok/s Pro

GPT-5 Medium 24 tok/s Pro

GPT-5 High 24 tok/s Pro

GPT-4o 112 tok/s Pro

Kimi K2 199 tok/s Pro

GPT OSS 120B 449 tok/s Pro

Claude Sonnet 4 37 tok/s Pro

2000 character limit reached

Distribution Shifts at Scale: Out-of-distribution Detection in Earth Observation (2412.13394v2)

Published 18 Dec 2024 in cs.CV, cs.AI, and cs.LG

Abstract: Training robust deep learning models is crucial in Earth Observation, where globally deployed models often face distribution shifts that degrade performance, especially in low-data regions. Out-of-distribution (OOD) detection addresses this by identifying inputs that deviate from in-distribution (ID) data. However, existing methods either assume access to OOD data or compromise primary task performance, limiting real-world use. We introduce TARDIS, a post-hoc OOD detection method designed for scalable geospatial deployment. Our core innovation lies in generating surrogate distribution labels by leveraging ID data within the feature space. TARDIS takes a pre-trained model, ID data, and data from an unknown distribution (WILD), separates WILD into surrogate ID and OOD labels based on internal activations, and trains a binary classifier to detect distribution shifts. We validate on EuroSAT and xBD across 17 setups covering covariate and semantic shifts, showing near-upper-bound surrogate labeling performance in 13 cases and matching the performance of top post-hoc activation- and scoring-based methods. Finally, deploying TARDIS on Fields of the World reveals actionable insights into pre-trained model behavior at scale. The code is available at \href{https://github.com/microsoft/geospatial-ood-detection}{https://github.com/microsoft/geospatial-ood-detection}

Summary

The paper introduces TARDIS, a post-hoc method for large-scale out-of-distribution detection in Earth Observation using surrogate labels without model retraining.
TARDIS leverages internal model activations to assign surrogate in-distribution and out-of-distribution labels, enabling effective binary OOD detection with only known ID samples during training.
Empirical validation shows TARDIS performs near theoretical upper bounds on EuroSAT and xBD datasets and scales effectively on large Sentinel-2 imagery, enhancing deployment reliability.

Insights into Distribution Shifts at Scale in Earth Observation

The paper "Distribution Shifts at Scale: Out-of-distribution Detection in Earth Observation" addresses a fundamental concern in the deployment of deep learning models for Earth Observation (EO): the capability to detect out-of-distribution (OOD) scenarios. Such scenarios critically impair model performance, notably in geospatial deployments where distribution shifts are pervasive due to factors like temporal, spatial, and environmental variability.

Core Contributions

This paper introduces TARDIS (Test-time Addressing of Distribution Shifts at Scale), a post-hoc method designed to detect OOD samples effectively on a large scale without compromising the model's performance on the primary task. TARDIS uniquely handles unknown test-time distributions using surrogate labels generated from in-distribution (ID) data and unidentified distributions. This method encompasses leveraging internal activations to assign surrogate ID and OOD labels, thus facilitating a binary classifier's training for OOD detection without needing explicit OOD examples during training.

Methodological Innovation

TARDIS distinguishes itself by seamlessly integrating into existing workflows. It does not require retraining models or modifying existing neural network architectures. The novel concept of surrogate label generation allows TARDIS to work with a pre-trained model and a dataset of known ID samples while handling unknown distribution samples, termed WILD samples. By clustering activation features and applying a defined threshold, it assigns surrogate labels that remarkably approach the theoretical upper bounds of OOD detection performance across varied scenarios.

Empirical Validation

The authors validate TARDIS across the EuroSAT and xBD datasets, involving scenarios of both covariate and semantic shifts. The efficacy of TARDIS is underscored by its performance nearing the upper bounds theoretically possible for cases with known OOD distributions. For instance, the results indicate that in 13 out of 17 experimental setups, TARDIS achieves performance nearly indistinguishable from an oracle classifier. Such robust performance across diverse settings illustrates its adaptability and potential as a practical solution for global deployments.

Implications and Future Directions

The implementation of TARDIS at scale, as demonstrated with the Fields of the World dataset using Sentinel-2 imagery, showcases its scalability and utility in real-world applications where models face varying distribution shifts at deployment. This scaling ability implies that organizations can deploy models that alert practitioners to potential distributional violations, thereby guiding resource allocation for further data collection and model retraining.

The implications of this research extend to enhancing model reliability and trustworthiness, improving transparency in geospatial analytics, and reducing the risk of catastrophic decision-making errors due to model overconfidence. The integration of TARDIS into EO pipelines supports a human-in-the-loop approach, reinforcing trust, transparency, and accountability.

Moving forward, the research will likely build on TARDIS's framework, examining more granularly the activation patterns that lead to OOD detection and exploring its integration with other real-time adaptive learning strategies. Additionally, future work could investigate the trade-offs between surrogate label granularity and computational efficiency to further optimize the balance between OOD detection performance and deployment costs.

In conclusion, the introduction and validation of TARDIS mark a significant step toward practical and scalable OOD detection in the domain of Earth Observation, with vast potential applications in other fields requiring robust diagnostic tools for ML model deployments.