SCM-ReID: GAN-based Domain Adaptation
- SCM-ReID is a framework that combines GAN-based image translation and IQA reweighting to address appearance variations in cross-domain person re-identification.
- IQAGA utilizes StarGAN for style adaptation and a ResNet-50 backbone with IQA-weighted loss to achieve 32–36% mAP improvements on benchmarks like Market→Duke.
- DAPRH builds on IQAGA by integrating domain-invariant mapping, Vision Transformer-based holistic features, and pseudo-label refinement to boost mAP to over 70% on Market→Duke.
Person re-identification (ReID) under domain shift is a principal challenge for intelligent surveillance, primarily due to appearance variations and domain discrepancies between cameras or datasets. IQAGA (Image Quality–driven GAN Augmentation) and DAPRH (GAN Augmentation + Pseudo-Label Refinement + Holistic features) are two advanced unsupervised domain adaptation (UDA) methods that integrate generative augmentation, domain adaptation, and discriminative learning to improve cross-domain ReID performance. Both methods leverage StarGAN-based image translation, but they differ in their domain alignment and target-domain learning strategies, achieving substantial gains in mAP and Rank-1 accuracy compared to prior UDA approaches (Pham et al., 4 Jan 2026).
1. Architectural Overview and Workflow
1.1 IQAGA
IQAGA adopts a two-stage workflow:
- Stage I (Image-Level Adaptation): StarGAN is trained to translate source images into target camera styles by optimizing adversarial loss (), domain classification (), cycle reconstruction (), identity mapping (), and identity-preserving color loss (). For each , style-transferred images are generated and combined with the original source set to form .
- Stage II (Supervised Training on Source): A ResNet-50 backbone extracts 2048-dimensional features. Supervised learning is performed using cross-entropy () and triplet loss (), with sample-level reweighting based on normalized image quality assessment (IQA) scores . The IQA-weighted loss () increases the contribution of high-quality samples.
1.2 DAPRH
DAPRH extends IQAGA with additional target-side adaptation modules:
- Stage I (Source Training): StarGAN-based augmentation is used, but only a fraction of real:GAN images per batch (e.g., 4:1) are included to reduce GAN-induced noise. A domain-invariant mapping (DIM) module employs adversarial training between a domain classifier and the feature extractor to learn domain-confused features.
- Stage II (Target Unsupervised Training): Unlabeled target features are extracted and clustered (DBSCAN) to assign pseudo-labels . Enhanced queries are generated through a Vision Transformer (ViT), integrating global (CLS token) and top-K local features. Pseudo-label refinement (CRL) updates labels with soft cluster probabilities and silhouette scores. Further, a teacher-student framework with EMA parameter transfer, camera-aware proxies (CAP), and contrastive objectives are employed for robust target learning.
2. Loss Functions and Optimization
2.1 IQAGA Losses
- StarGAN Generator Loss:
- Discriminator Loss:
- Source Training Loss:
For each sample, with ; Total loss:
2.2 DAPRH Losses
- Domain-Invariant Mapping:
- Target Unsupervised Loss:
- : Instance-to-cluster contrastive loss
- : Camera-aware contrastive loss
- : KL divergence loss between Teacher and Student
- : Soft triplet loss on features
3. Role of Pseudo-Labeling and Refinement
- IQAGA: Does not use pseudo-labels in its training; adaptation is achieved exclusively by GAN-based augmentation and IQA reweighting.
- DAPRH: Applies clustering (DBSCAN) to infer hard pseudo-labels, which are then refined using cluster soft probabilities (based on distance to cluster centers), silhouette scores for filtering, and label smoothing: . The pseudo-label refinement is iteratively applied each epoch, with a Teacher-Student model facilitating stable target learning.
4. Integration of Domain-Invariant and Holistic Feature Modules
- IQAGA: Does not incorporate domain-invariant mapping or holistic features.
- DAPRH: Uses a domain classifier for adversarial domain confusion during source training. During target training, features are enhanced via ViT-based holistic representations, incorporating both global and local spatial tokens. Further, CAP (camera-aware proxies) decomposes clusters by camera, and contrastive losses are employed at both cluster and camera levels.
5. Training Protocols and Hyperparameters
IQAGA
- GAN Augmentation: StarGAN trained with Adam (, , , , ), batch size 16.
- Supervised Training: ResNet-50 pretrained on ImageNet, Adam (, reduced by 0.1 at epochs 40,70), batch size 128 (16 IDs × 8 images), 120 epochs, triplet margin , IQA reweighting .
DAPRH
- GAN Augmentation: Same as IQAGA.
- Source & DIM: Batch size 128 (16 IDs × 8 images), .
- Target Training: DBSCAN (Market), minPts=8 (Market), 16 (MSMT), top-K local tokens (top 40%), , , , , EMA momentum , 50–80 epochs, SGD .
6. Empirical Results and Comparative Analysis
Below are reported results on key cross-domain benchmarks, measuring mean Average Precision (mAP) and Rank-1 accuracy.
| Method/Stage | Market→Duke (mAP / R1) | Duke→Market (mAP / R1) | Market→MSMT (mAP / R1) | Duke→MSMT (mAP / R1) |
|---|---|---|---|---|
| Baseline (ce+triplet) | 25.8% / 43.7% | 26.2% / 55.3% | – | – |
| +GAN only | 31.5% / 54.2% | 35.1% / 68.6% | – | – |
| +GAN+IQA (IQAGA final) | 32.1% / 55.5% | 36.3% / 70.2% | – | – |
| +GAN+DIM | 35.4% / 58.2% | – | – | – |
| DAPRH final | 72.0% / 83.7% | 85.9% / 94.4% | 35.8% / 64.8% | 36.0% / 65.5% |
GAN augmentation alone yields strong improvements over the baseline, with IQA-driven weighting (IQAGA) giving further 1% mAP gains. In DAPRH, the inclusion of holistic features, camera-aware proxies, CRL, and EMA achieves major additional improvements (e.g., 72.0% mAP Market→Duke). DAPRH narrows the gap to fully supervised methods (where SOTA achieves 98%+ locally) and surpasses earlier UDA results, which capped at ≤40 mAP on MSMT and ≤80 mAP on Market/Duke (Pham et al., 4 Jan 2026).
7. Qualitative Insights and Contributions
- GAN augmentation provides the largest early-stage gains in both methods, while DIM is computationally cheaper but less impactful alone.
- Additive combination of holistic representations (EIR/ViT backbone), camera-aware proxies, CRL, and Teacher-Student learning provides +2–4 mAP gains independently, with their combination yielding a further improvement of +1–2 mAP.
- Key hyperparameters such as (label smoothing), (CAP loss weight), and top-K local ratio exhibit optimal ranges: –$0.6$, , and top-K .
- IQAGA demonstrates how simple GAN+IQA weighting yields a 32–36% mAP on Market→Duke, outperforming previous GAN-based UDA by 1–3 mAP. DAPRH, through integration of style-transfer, domain-alignment, advanced feature compositions, and robust pseudo-label refinement, achieves >70 mAP on Market→Duke and >85 mAP on Duke→Market (Pham et al., 4 Jan 2026).
The performance and methodology of IQAGA and DAPRH highlight that combining generative augmentation, domain-aligned representation learning, and sophisticated pseudo-labeling can bridge much of the cross-domain gap for ReID without requiring target-domain labels.