Papers
Topics
Authors
Recent
2000 character limit reached

SCM-ReID: GAN-based Domain Adaptation

Updated 11 January 2026
  • SCM-ReID is a framework that combines GAN-based image translation and IQA reweighting to address appearance variations in cross-domain person re-identification.
  • IQAGA utilizes StarGAN for style adaptation and a ResNet-50 backbone with IQA-weighted loss to achieve 32–36% mAP improvements on benchmarks like Market→Duke.
  • DAPRH builds on IQAGA by integrating domain-invariant mapping, Vision Transformer-based holistic features, and pseudo-label refinement to boost mAP to over 70% on Market→Duke.

Person re-identification (ReID) under domain shift is a principal challenge for intelligent surveillance, primarily due to appearance variations and domain discrepancies between cameras or datasets. IQAGA (Image Quality–driven GAN Augmentation) and DAPRH (GAN Augmentation + Pseudo-Label Refinement + Holistic features) are two advanced unsupervised domain adaptation (UDA) methods that integrate generative augmentation, domain adaptation, and discriminative learning to improve cross-domain ReID performance. Both methods leverage StarGAN-based image translation, but they differ in their domain alignment and target-domain learning strategies, achieving substantial gains in mAP and Rank-1 accuracy compared to prior UDA approaches (Pham et al., 4 Jan 2026).

1. Architectural Overview and Workflow

1.1 IQAGA

IQAGA adopts a two-stage workflow:

  • Stage I (Image-Level Adaptation): StarGAN is trained to translate source images xsx_s into CtC_t target camera styles cc by optimizing adversarial loss (LadvL_{adv}), domain classification (LclsL_{cls}), cycle reconstruction (LrecL_{rec}), identity mapping (LidtL_{idt}), and identity-preserving color loss (LpidL_{pid}). For each xsx_s, CtC_t style-transferred images are generated and combined with the original source set to form DsyncD_{sync}.
  • Stage II (Supervised Training on Source): A ResNet-50 backbone extracts 2048-dimensional features. Supervised learning is performed using cross-entropy (LceL_{ce}) and triplet loss (LtriL_{tri}), with sample-level reweighting based on normalized image quality assessment (IQA) scores ziz_i. The IQA-weighted loss (LiL_i) increases the contribution of high-quality samples.

1.2 DAPRH

DAPRH extends IQAGA with additional target-side adaptation modules:

  • Stage I (Source Training): StarGAN-based augmentation is used, but only a fraction N:MN:M of real:GAN images per batch (e.g., 4:1) are included to reduce GAN-induced noise. A domain-invariant mapping (DIM) module employs adversarial training between a domain classifier DnetD_{net} and the feature extractor fefe to learn domain-confused features.
  • Stage II (Target Unsupervised Training): Unlabeled target features are extracted and clustered (DBSCAN) to assign pseudo-labels yiy_i. Enhanced queries viv_i are generated through a Vision Transformer (ViT), integrating global (CLS token) and top-K local features. Pseudo-label refinement (CRL) updates labels with soft cluster probabilities and silhouette scores. Further, a teacher-student framework with EMA parameter transfer, camera-aware proxies (CAP), and contrastive objectives are employed for robust target learning.

2. Loss Functions and Optimization

2.1 IQAGA Losses

  • StarGAN Generator Loss:

LG=Ladv(G)+λclsLcls(G)+λrecLrec+λidtLidt+λpidLpidL_G = L_{adv}(G) + \lambda_{cls} L_{cls}(G) + \lambda_{rec} L_{rec} + \lambda_{idt} L_{idt} + \lambda_{pid} L_{pid}

  • Discriminator Loss:

LD=Ladv(D)+λclsLcls(D)L_D = L_{adv}(D) + \lambda_{cls} L_{cls}(D)

  • Source Training Loss:

For each sample, zi=clamp(fi−μfσf/h,−1,1)z_i = \text{clamp}\left(\frac{f_i - \mu_f}{\sigma_f / h}, -1, 1\right) with h=0.33h=0.33; Li=(1+Azzi)⋅(Lce,i+Ltri,i)L_i = (1 + A_z z_i) \cdot (L_{ce,i} + L_{tri,i}) Total loss: Lsource=∑iLiL_{source} = \sum_i L_i

2.2 DAPRH Losses

  • Domain-Invariant Mapping:

LD=Es∈source[(Dnet(fs)−1)2]+Et∈target[Dnet(ft)2]L_D = \mathbb{E}_{s \in source}[(D_{net}(f_s) - 1)^2] + \mathbb{E}_{t \in target}[D_{net}(f_t)^2]

LDIM=Es[(Dnet(fs)−0.5)2]+Et[(Dnet(ft)−0.5)2]L_{DIM} = \mathbb{E}_{s}[(D_{net}(f_s) - 0.5)^2] + \mathbb{E}_{t}[(D_{net}(f_t) - 0.5)^2]

  • Target Unsupervised Loss:
    • LNCEL_{NCE}: Instance-to-cluster contrastive loss
    • LCAPL_{CAP}: Camera-aware contrastive loss
    • LKLL_{KL}: KL divergence loss between Teacher and Student
    • LstriL_{stri}: Soft triplet loss on features

3. Role of Pseudo-Labeling and Refinement

  • IQAGA: Does not use pseudo-labels in its training; adaptation is achieved exclusively by GAN-based augmentation and IQA reweighting.
  • DAPRH: Applies clustering (DBSCAN) to infer hard pseudo-labels, which are then refined using cluster soft probabilities GiG_i (based on distance to cluster centers), silhouette scores for filtering, and label smoothing: y^i=(1−α)yi+αGi\hat{y}_i = (1 - \alpha) y_i + \alpha G_i. The pseudo-label refinement is iteratively applied each epoch, with a Teacher-Student model facilitating stable target learning.

4. Integration of Domain-Invariant and Holistic Feature Modules

  • IQAGA: Does not incorporate domain-invariant mapping or holistic features.
  • DAPRH: Uses a domain classifier DnetD_{net} for adversarial domain confusion during source training. During target training, features are enhanced via ViT-based holistic representations, incorporating both global and local spatial tokens. Further, CAP (camera-aware proxies) decomposes clusters by camera, and contrastive losses are employed at both cluster and camera levels.

5. Training Protocols and Hyperparameters

IQAGA

  • GAN Augmentation: StarGAN trained with Adam (lr=3.5×10−5lr=3.5 \times 10^{-5}, λcls=1\lambda_{cls}=1, λrec=10\lambda_{rec}=10, λidt=1\lambda_{idt}=1, λpid=10\lambda_{pid}=10), batch size 16.
  • Supervised Training: ResNet-50 pretrained on ImageNet, Adam (lr=3.5×10−4lr=3.5 \times 10^{-4}, reduced by 0.1 at epochs 40,70), batch size 128 (16 IDs × 8 images), 120 epochs, triplet margin α=0.3\alpha=0.3, IQA reweighting Az=0.8A_z=0.8.

DAPRH

  • GAN Augmentation: Same as IQAGA.
  • Source & DIM: Batch size 128 (16 IDs × 8 images), λDIM=0.1\lambda_{DIM}=0.1.
  • Target Training: DBSCAN ϵ=0.6\epsilon=0.6 (Market), minPts=8 (Market), 16 (MSMT), top-K local tokens K=5K=5 (top 40%), α=0.4\alpha=0.4, γ=1.0\gamma=1.0, βKL=0.4\beta_{KL}=0.4, βtri=0.8\beta_{tri}=0.8, EMA momentum w=0.99w=0.99, 50–80 epochs, SGD lr=1×10−3lr=1 \times 10^{-3}.

6. Empirical Results and Comparative Analysis

Below are reported results on key cross-domain benchmarks, measuring mean Average Precision (mAP) and Rank-1 accuracy.

Method/Stage Market→Duke (mAP / R1) Duke→Market (mAP / R1) Market→MSMT (mAP / R1) Duke→MSMT (mAP / R1)
Baseline (ce+triplet) 25.8% / 43.7% 26.2% / 55.3% – –
+GAN only 31.5% / 54.2% 35.1% / 68.6% – –
+GAN+IQA (IQAGA final) 32.1% / 55.5% 36.3% / 70.2% – –
+GAN+DIM 35.4% / 58.2% – – –
DAPRH final 72.0% / 83.7% 85.9% / 94.4% 35.8% / 64.8% 36.0% / 65.5%

GAN augmentation alone yields strong improvements over the baseline, with IQA-driven weighting (IQAGA) giving further 1% mAP gains. In DAPRH, the inclusion of holistic features, camera-aware proxies, CRL, and EMA achieves major additional improvements (e.g., 72.0% mAP Market→Duke). DAPRH narrows the gap to fully supervised methods (where SOTA achieves 98%+ locally) and surpasses earlier UDA results, which capped at ≤40 mAP on MSMT and ≤80 mAP on Market/Duke (Pham et al., 4 Jan 2026).

7. Qualitative Insights and Contributions

  • GAN augmentation provides the largest early-stage gains in both methods, while DIM is computationally cheaper but less impactful alone.
  • Additive combination of holistic representations (EIR/ViT backbone), camera-aware proxies, CRL, and Teacher-Student learning provides +2–4 mAP gains independently, with their combination yielding a further improvement of +1–2 mAP.
  • Key hyperparameters such as α\alpha (label smoothing), γ\gamma (CAP loss weight), and top-K local ratio exhibit optimal ranges: α≈0.4\alpha \approx 0.4–$0.6$, γ≈1.0\gamma \approx 1.0, and top-K ≈0.4\approx 0.4.
  • IQAGA demonstrates how simple GAN+IQA weighting yields a 32–36% mAP on Market→Duke, outperforming previous GAN-based UDA by 1–3 mAP. DAPRH, through integration of style-transfer, domain-alignment, advanced feature compositions, and robust pseudo-label refinement, achieves >70 mAP on Market→Duke and >85 mAP on Duke→Market (Pham et al., 4 Jan 2026).

The performance and methodology of IQAGA and DAPRH highlight that combining generative augmentation, domain-aligned representation learning, and sophisticated pseudo-labeling can bridge much of the cross-domain gap for ReID without requiring target-domain labels.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to SCM-ReID.