UniTTA: Unified Benchmark and Versatile Framework Towards Realistic Test-Time Adaptation (2407.20080v1)

Published 29 Jul 2024 in cs.CV and cs.LG

Abstract: Test-Time Adaptation (TTA) aims to adapt pre-trained models to the target domain during testing. In reality, this adaptability can be influenced by multiple factors. Researchers have identified various challenging scenarios and developed diverse methods to address these challenges, such as dealing with continual domain shifts, mixed domains, and temporally correlated or imbalanced class distributions. Despite these efforts, a unified and comprehensive benchmark has yet to be established. To this end, we propose a Unified Test-Time Adaptation (UniTTA) benchmark, which is comprehensive and widely applicable. Each scenario within the benchmark is fully described by a Markov state transition matrix for sampling from the original dataset. The UniTTA benchmark considers both domain and class as two independent dimensions of data and addresses various combinations of imbalance/balance and i.i.d./non-i.i.d./continual conditions, covering a total of ( (2 \times 3)² = 36 ) scenarios. It establishes a comprehensive evaluation benchmark for realistic TTA and provides a guideline for practitioners to select the most suitable TTA method. Alongside this benchmark, we propose a versatile UniTTA framework, which includes a Balanced Domain Normalization (BDN) layer and a COrrelated Feature Adaptation (COFA) method--designed to mitigate distribution gaps in domain and class, respectively. Extensive experiments demonstrate that our UniTTA framework excels within the UniTTA benchmark and achieves state-of-the-art performance on average. Our code is available at \url{https://github.com/LeapLabTHU/UniTTA}.

Summary

The paper introduces UniTTA, a comprehensive benchmark and versatile framework designed to address realistic Test-Time Adaptation (TTA) scenarios involving dynamic shifts and class imbalances, unlike traditional i.i.d. evaluations.
The UniTTA framework includes Balanced Domain Normalization (BDN) to handle domain shifts with class imbalance and COrrelated Feature Adaptation (COFA) to leverage temporal correlation in data streams.
Extensive experiments on benchmark datasets show the UniTTA framework consistently achieves superior performance in challenging real-world test-time environments compared to state-of-the-art methods.

The paper introduces the "UniTTA: Unified Benchmark and Versatile Framework Towards Realistic Test-Time Adaptation", addressing the challenge of Test-Time Adaptation (TTA) for pre-trained models in dynamic and realistic environments. This work identifies that traditional TTA evaluations largely assume independent and identically distributed (i.i.d.) conditions, lacking comprehensive benchmarks that encapsulate real-world complexities such as continual domain shifts and class imbalances.

At the core of this work is the proposal of a comprehensive "Unified Test-Time Adaptation (UniTTA)" benchmark, defined by a Markov state transition matrix framework across a diverse set of 36 distinct scenarios. These scenarios consider variations along both domain and class distribution dimensions, including balanced and imbalanced as well as i.i.d., non-i.i.d., and continual shift conditions.

The paper goes further by introducing an accompanying versatile framework also named UniTTA, which incorporates two novel components:

Balanced Domain Normalization (BDN): This component addresses domain distribution shifts through dynamic recalibration by class, achieving a balance in domain-wise statistics despite class imbalances. BDN employs statistics computed per class within each domain and then averages these to counteract class imbalance effects.
COrrelated Feature Adaptation (COFA): This method leverages temporal correlation between classes by referencing the features of previous samples, effectively adapting to temporally correlated data streams without altering model parameters. COFA includes a confidence-based filtering mechanism to apply its adaptations selectively, allowing robust performance across both i.i.d. and non-i.i.d. scenarios.

Extensive experimental results using several benchmark datasets (CIFAR10-C, CIFAR100-C, and ImageNet-C) demonstrate the effectiveness of this framework, showing that the UniTTA framework consistently achieves superior performance across a wide range of challenging test-time environments compared to state-of-the-art methods. The paper's empirical analysis underscores the necessity and practicality of both BDN and COFA, providing a significant contribution to the field of model adaptation under real-world domain shifts.

In conclusion, this work fills a critical gap in TTA by providing both a unified evaluation framework and a robust adaptation method capable of handling realistic deployment scenarios, thus offering a blueprint for future research and application in this domain.

PDF Markdown

Related Papers

GitHub

GitHub - LeapLabTHU/UniTTA (7 stars)