Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Domain Adaptation without Source Data (2007.01524v4)

Published 3 Jul 2020 in cs.CV, cs.LG, and eess.IV

Abstract: Domain adaptation assumes that samples from source and target domains are freely accessible during a training phase. However, such an assumption is rarely plausible in the real-world and possibly causes data-privacy issues, especially when the label of the source domain can be a sensitive attribute as an identifier. To avoid accessing source data that may contain sensitive information, we introduce Source data-Free Domain Adaptation (SFDA). Our key idea is to leverage a pre-trained model from the source domain and progressively update the target model in a self-learning manner. We observe that target samples with lower self-entropy measured by the pre-trained source model are more likely to be classified correctly. From this, we select the reliable samples with the self-entropy criterion and define these as class prototypes. We then assign pseudo labels for every target sample based on the similarity score with class prototypes. Furthermore, to reduce the uncertainty from the pseudo labeling process, we propose set-to-set distance-based filtering which does not require any tunable hyperparameters. Finally, we train the target model with the filtered pseudo labels with regularization from the pre-trained source model. Surprisingly, without direct usage of labeled source samples, our PrDA outperforms conventional domain adaptation methods on benchmark datasets. Our code is publicly available at https://github.com/youngryan1993/SFDA-SourceFreeDA

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Youngeun Kim (48 papers)
  2. Donghyeon Cho (20 papers)
  3. Kyeongtak Han (3 papers)
  4. Priyadarshini Panda (104 papers)
  5. Sungeun Hong (18 papers)
Citations (159)

Summary

Domain Adaptation without Source Data: A Comprehensive Overview

The paper in discussion introduces a novel approach for domain adaptation in scenarios where direct access to source data is constrained due to privacy concerns or legal restrictions. The proposed method, termed Source data-Free Domain Adaptation (SFDA), offers an innovative solution by utilizing a pre-trained source model without relying on the availability of source domain samples. This paradigm shift addresses a critical challenge in real-world applications where data privacy is paramount, such as in fields dealing with sensitive biometric information.

The authors begin by highlighting the traditional assumption in unsupervised domain adaptation (UDA) that requires access to both labeled source data and unlabeled target data during training. This assumption is often impractical in sensitive applications. SFDA circumvents this issue by leveraging a pre-trained model from the source domain and updating the target model in a self-learning manner. This enables domain adaptation without direct access to source samples, thus preserving privacy and compliance with data protection regulations.

Methodology

The core methodology of SFDA is predicated on the self-entropy criterion for selecting reliable samples from the target domain. The paper posits that target samples with lower self-entropy, as assessed by the pre-trained source model, are more likely to be classified correctly. These reliable samples are used to define class prototypes, which in turn guide the assignment of pseudo labels to target samples based on similarity scores.

To enhance the reliability of pseudo labels amidst potential uncertainty, the authors propose a set-to-set distance-based filtering mechanism. This filtering process eschews the need for tunable hyperparameters, opting instead for a distance metric that determines the confidence level of pseudo labels assigned to samples.

Key Results

The proposed SFDA framework demonstrated superior performance compared to conventional UDA methods that utilize labeled source data, as evidenced by experiments on benchmark datasets like Office-31, Office-Home, and VisDA-C. Notably, SFDA outperforms these traditional methods without direct use of labeled source samples, indicating the robustness and efficacy of the approach in preserving source knowledge through a pre-trained model.

Implications and Future Directions

From a theoretical standpoint, SFDA provides insight into how pre-trained models can be utilized effectively for domain adaptation without accessing potentially sensitive source data. This has significant implications for privacy-preserving machine learning, enabling the adoption of AI solutions in sensitive domains while ensuring compliance with data privacy laws such as GDPR.

Practically, the introduction of SFDA paves the way for deploying domain adaptation techniques in environments with strict data access limitations, such as healthcare and finance. This method has the potential to become a cornerstone strategy for organizations looking to balance the benefits of data-driven decision-making with the need for stringent data privacy.

Future research could extend SFDA to more complex scenarios, such as open-set or partial domain adaptation, where source and target domain classes may not perfectly overlap. Additionally, exploring more sophisticated methods for assessing and improving the reliability of pseudo labels could further enhance the performance and applicability of SFDA in diverse domains.

In conclusion, the SFDA method provides a compelling path forward for domain adaptation in privacy-sensitive contexts, and its development represents a significant step towards more secure and compliant AI systems.

Github Logo Streamline Icon: https://streamlinehq.com