Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
112 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation (2304.12566v2)

Published 25 Apr 2023 in cs.LG

Abstract: Many recent machine learning tasks focus to develop models that can generalize to unseen distributions. Domain generalization (DG) has become one of the key topics in various fields. Several literatures show that DG can be arbitrarily hard without exploiting target domain information. To address this issue, test-time adaptive (TTA) methods are proposed. Existing TTA methods require offline target data or extra sophisticated optimization procedures during the inference stage. In this work, we adopt Non-Parametric Classifier to perform the test-time Adaptation (AdaNPC). In particular, we construct a memory that contains the feature and label pairs from training domains. During inference, given a test instance, AdaNPC first recalls K closed samples from the memory to vote for the prediction, and then the test feature and predicted label are added to the memory. In this way, the sample distribution in the memory can be gradually changed from the training distribution towards the test distribution with very little extra computation cost. We theoretically justify the rationality behind the proposed method. Besides, we test our model on extensive numerical experiments. AdaNPC significantly outperforms competitive baselines on various DG benchmarks. In particular, when the adaptation target is a series of domains, the adaptation accuracy of AdaNPC is 50% higher than advanced TTA methods. The code is available at https://github.com/yfzhang114/AdaNPC.

Citations (34)

Summary

  • The paper introduces AdaNPC, a novel non-parametric approach leveraging a KNN classifier and memory bank for efficient test-time adaptation in online settings to improve domain generalization.
  • AdaNPC adapts models by recalling nearest samples from a memory bank for prediction voting and continuously updating the memory with test data, theoretically reducing domain divergence.
  • Empirical results show AdaNPC improves generalization accuracy across benchmarks and demonstrates resilience against catastrophic forgetting, making it suitable for real-time inference on edge devices.

AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation

The paper "AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation" introduces a novel method for improving domain generalization through test-time adaptation without additional parameter tuning. The proposed AdaNPC leverages a non-parametric classifier, specifically a k-nearest neighbors (KNN) classifier, to adapt models on-the-fly in online settings. This approach addresses two critical challenges in domain generalization: the degradation of performance due to domain distribution shifts and the tendency of existing methods to forget knowledge from source domains.

Key Concepts and Methodology

The AdaNPC method capitalizes on a memory bank containing features and labels from training domains to facilitate adaptation during inference. When presented with a test instance, the model recalls the kk nearest samples for prediction voting, subsequently updating the memory with the test feature and its predicted label. This continuous adaptation process allows the distribution in memory to gradually shift from the training towards the test distribution with minimal computational overhead.

The theoretical foundation of AdaNPC is built on the Wasserstein-distance guided representation learning theory, where the risk on the unseen target domain is bounded by the source domain risk and the Wasserstein distance between source and target distributions. The incorporation of online test samples into the memory bank further tightens this bound by reducing domain divergence through structured representations.

Theoretical Insights

  • Reduction in Domain Divergence: AdaNPC's use of KNN classifiers inherently reduces domain divergence by selecting data points with similarity to target samples, thus minimizing the Wasserstein distance.
  • Impact of Test-Time Adaptation: By integrating test-time samples into the memory bank, AdaNPC effectively diminishes excess risk under both covariate-shift and posterior-shift settings. The assumption that pseudo-labeling accurately estimates the target distribution is pivotal in justifying tighter error bounds.
  • Parameterization: Impressive results are exhibited when the number of neighbors (kk) is logarithmic in relation to the sample size, balancing the trade-off between neighborhood specificity and generalization capability.

Empirical Validation

The model's performance was evaluated across numerous domain generalization benchmarks such as Rotated MNIST, PACS, VLCS, TerraIncognita, and DomainNet. AdaNPC displayed significant improvements in generalization accuracy over competitive baselines. Notably, successive domain adaptation experiments demonstrated AdaNPC's resilience against catastrophic forgetting, a common pitfall of traditional test-time adaptation methods like Tent and T3A.

Future Directions and Practical Implications

The implications of AdaNPC extend to practical scenarios where models are deployed to dynamically adapt to new environments continuously. The non-parametric approach eschews additional training overhead, making AdaNPC particularly suited for edge devices with real-time inference needs.

Furthermore, AdaNPC opens a dialogue on leveraging large pre-trained models without fine-tuning. This direction is promising, especially as computational costs for model updates escalate with the scale of pre-trained networks.

Conclusion

AdaNPC presents a compelling argument for using non-parametric strategies in test-time adaptation, offering both theoretical and empirical advantages. It reshapes the expectations around model adaptation to unseen domains while providing a robust solution to the domain forgetting problem. The method paves the way for more adaptive, memory-efficient solutions, with wide-ranging applications in AI deployments across diverse and evolving environments.