Class-Incremental Unsupervised Domain Adaptation

Updated 21 September 2025

CI-UDA is a learning paradigm where models trained on labeled source data are continuously adapted to evolving unlabeled target domains that introduce new classes over time.
It employs prototype-based alignment, attribute modeling, and distributional techniques to address domain shift and mitigate catastrophic forgetting effectively.
Empirical benchmarks demonstrate that CI-UDA methods improve accuracy and retention, supporting robust performance in dynamic, real-world applications.

Class-Incremental Unsupervised Domain Adaptation (CI-UDA) refers to the problem of continually adapting models trained on a labeled source domain to an evolving, unlabeled target domain in which new classes (or subsets of classes) are introduced incrementally over time. Unlike conventional unsupervised domain adaptation, CI-UDA must address both domain shift and the challenge of catastrophic forgetting, as target classes present at earlier time steps may become unavailable in later adaptation rounds. The field combines insights from domain adaptation, continual learning, representation alignment, and prototype-based methods to advance robust generalization under non-stationary, label-incomplete conditions.

1. Problem Formulation and Challenges

Class-Incremental Unsupervised Domain Adaptation defines the setting whereby a source domain contains labeled data for all classes, while the target domain provides unlabeled data in sequential sessions, each presenting a disjoint subset of source classes (Lin et al., 2022, Deng et al., 2024). At each incremental step, only the current session’s target class data is available; prior class data may not be retained, and new classes may appear without correspondences in earlier sessions. The principal challenges are:

Domain Shift: Feature distributions between source and target domains differ, degrading target performance if not properly aligned (Gallego et al., 2020, Marsden et al., 2022).
Class Incrementality: Target label sets evolve over time, challenging adaptation methods that expect fixed class correspondence (Kundu et al., 2020).
Catastrophic Forgetting: Adapting to new target classes may overwrite knowledge about previous classes if explicit memory mechanisms (e.g., rehearsal, topology distillation) are not used (Deng et al., 2024, Rostami, 2024).
Partial or Source-Free Regimes: In some variants (CI-SFUDA), source data is unavailable during adaptation, increasing the need for knowledge transfer via model parameters, prototypes, or other compressed representations (Deng et al., 2024).

2. Core Methodologies

The literature converges on several methodological approaches to address CI-UDA:

2.1 Prototype-Based and Alignment Methods

Prototype-guided continual adaptation (ProCA) and topology distillation frameworks use class feature prototypes as anchors for both domain alignment and memory replay (Lin et al., 2022, Deng et al., 2024). Prototypes are computed per class using feature centers and regularly updated as class data arrives incrementally. Topological relationships between source and target prototypes are enforced by compactness and separability losses.

2.2 Attribute and Prompt-Based Representation (CLIP/VisTA)

Attribute modeling approaches mine class-agnostic, transferable knowledge by representing each image through attribute key-value pairs (visual prototype and text prompt), leveraging vision-LLMs like CLIP (Mi et al., 14 Sep 2025). Separate attribute dictionaries for source and target domains are constructed, and cross-domain alignment is facilitated via visual attention consistency (attention map similarity) and prediction consistency (Jensen–Shannon divergence between prompt-based probability vectors).

2.3 Distributional Alignment and Stabilized Representations

Models such as IMUDA and LDAuCID compactify internal latent representations by fitting a Gaussian mixture model (GMM) to the source encoder outputs, then align new target domain representations by minimizing a Wasserstein distance (e.g., Sliced Wasserstein Distance, SWD) between target and GMM distributions (Rostami, 2024, Rostami, 2024). Experience replay buffers retain representative samples from previous tasks to prevent forgetting.

2.4 Lightweight Style Transfer for Segmentation

Segmentation-centric approaches employ class-conditional AdaIN layers for style adaptation, using pseudo-labels to approximate target moments in feature maps (Marsden et al., 2022). Memory banks store class-wise style statistics for replay, addressing the forward and backward domain gap as new target domains appear.

3. Mathematical Foundations and Theoretical Frameworks

Formalisms across CI-UDA works codify the risk and loss landscapes:

Gradient Update in Adversarial Training:

$\theta_f \leftarrow \theta_f - \mu \left( \frac{\partial \mathcal{L}_y}{\partial \theta_f} - \lambda\frac{\partial \mathcal{L}_d}{\partial \theta_f} \right)$ This adversarial weight update enforces domain invariance in feature extraction (Gallego et al., 2020).

Prototype Distance and Pseudo-Labeling:

$k = \operatorname{argmin}_{c \in \mathcal{C}_t} \|v_t - v_g^c\|_2$ Target samples are assigned pseudo-labels by minimizing Euclidean or cosine distance to trainable class guides (Kundu et al., 2020, Deng et al., 2024).

GMM-Based Representation Compactness:

$p_J(\mathbf{z}) = \sum_{j=1}^k \alpha_j\, \mathcal{N}(\mathbf{z}|\mu_j, \Sigma_j)$ Internal latent distributions are modeled as GMMs with class-wise statistics (Rostami, 2024).

SWD for Distribution Alignment:

$SW^2(p_S, p_T) \approx \frac{1}{L} \sum_{l=1}^L \sum_{i=1}^M | \langle \gamma_l, \phi(x_s[i]) \rangle - \langle \gamma_l, \phi(x_t[i]) \rangle |^2$ This metric aligns source and target feature distributions (Rostami, 2024).

CLIP-Based Probability Prediction:

$p(y = k | x) = \frac{\exp(\cos(w_k, z) / \tau)}{\sum_{c=1}^C \exp(\cos(w_c, z) / \tau)}$ Image and class prompt similarity drive label prediction (Mi et al., 14 Sep 2025).

4. Empirical Findings Across Benchmarks

Experiments on Office-31/Office-Home/Mini-DomainNet and segmentation datasets such as SYNTHIA-SEQ and BuildCrack show:

ProCA achieves step-level and final accuracy gains over rehearsal-based and prototype-free baselines, specifically minimizing performance degradation on classes from earlier time steps (Lin et al., 2022).
VisTA advances prior art by 0.7% (Office-31), 8.7% (Office-Home), and 24.2% (Mini-DomainNet) on final accuracy compared to PLDCA, indicating effective long-term retention and minimal forgetting in a rehearsal-free regime (Mi et al., 14 Sep 2025).
Continual AdaIN and CACE approaches yield higher mean Intersection-over-Union (mIoU) for segmentation on both synthetic and real-world sequences by leveraging class-specific style transfer and style replay, surpassing simpler color jitter/AdaIN baselines by several percentage points (Marsden et al., 2022).
GROTO, via multi-granularity prototype topology distillation, maintains higher final and session-wise accuracy compared to other source-free CI-UDA methods, with ablation studies confirming the importance of positive class mining and PTD modules (Deng et al., 2024).
Improvements in crack segmentation (CrackUDA) demonstrate that incremental adversarial UDA with domain-specific decoding significantly raises mIoU over source-only or standard UDA methods by 0.65 (source) and 2.7 (target) (Srivastava et al., 2024).

5. Memory, Rehearsal, and Privacy Considerations

A notable controversy addressed by recent works is the reliance on rehearsal buffers for mitigating catastrophic forgetting.

ProCA utilizes a target prototype memory bank, updated in real time, which avoids storing raw target samples yet supplies representative replay anchors for alignment (Lin et al., 2022).
VisTA is completely rehearsal-free, maintaining attribute dictionaries rather than sample buffers, which has positive implications for memory scalability and privacy (Mi et al., 14 Sep 2025).
GROTO does not require access to prior target sessions nor source data during adaptation, achieving source-free adaptation via topology distillation (Deng et al., 2024).
This suggests that moving toward more abstract, prototype- or attribute-based storage and alignment methods is a pragmatic response to real-world memory and privacy constraints in CI-UDA deployments.

6. Practical Applications, Impact, and Future Directions

CI-UDA underpins systems in robotics, video surveillance, medical imaging, and infrastructure monitoring, where deployment environments evolve and labeled data acquisition is costly or unfeasible. The following aspects are salient:

Prototype-guided alignment and memory replay are recommended for safety-critical continual adaptation applications, as demonstrated in civil structure crack monitoring (Srivastava et al., 2024) and video analytics (Lin et al., 2022).
Attribute-level alignment through CLIP-based mechanisms allows for lightweight adaptation and better privacy compliance, suitable for edge devices and settings with limited computational/storage resources (Mi et al., 14 Sep 2025).
Future research is likely to focus on dynamic prototype updating, refined positive class mining, further abstraction of memory mechanisms, and integration with meta-learning or contrastive self-supervision to better handle rapidly evolving data streams (Lin et al., 2022, Deng et al., 2024).
An emphasis on theoretical generalization bounds and the development of more robust distributional alignment metrics is expected, building on compact internal representation and SWD alignment frameworks (Rostami, 2024, Rostami, 2024).

7. Synthesis and Concluding Remarks

Class-Incremental Unsupervised Domain Adaptation is a rapidly advancing domain at the interface of machine learning, continual learning, and domain adaptation. Recent research demonstrates that prototype-guided, topology-driven, attribute-aligned, and distributionally compact models reduce catastrophic forgetting and address complex non-stationary adaptation scenarios. Attribute modeling with vision-LLMs and prototype topology distillation are notable innovations. These developments suggest that future CI-UDA systems will combine low-memory, privacy-preserving, and highly adaptive mechanisms leveraging multi-modal representation learning, robust statistical alignment, and dynamic prototype discovery to meet the demands of real-world deployments in open, continuously changing environments.