Papers
Topics
Authors
Recent
2000 character limit reached

IQAGA and DAPRH: UDA for Person ReID

Updated 11 January 2026
  • IQAGA and DAPRH are advanced unsupervised domain adaptation frameworks that use GAN-based augmentation to bridge the gap between source and target person re-ID data.
  • IQAGA employs a two-stage StarGAN-based style transfer with image quality weighting to mitigate artifacts and enhance training stability.
  • DAPRH integrates domain-invariant mapping, cluster-based pseudo-label refinement, and holistic ViT features, achieving significant mAP improvements on standard benchmarks.

IQAGA (Image Quality–Driven GAN Augmentation) and DAPRH (GAN Augmentation + Pseudo-Label Refinement + Holistic Features) are two advanced unsupervised domain adaptation (UDA) frameworks developed to address cross-domain generalization in person re-identification (ReID) tasks where source and target data distributions diverge sharply due to appearance variation, camera-specific styles, and lack of target labels. Both approaches utilize generative adversarial networks (GANs) for domain-specific image augmentation but diverge in loss composition, feature engineering, domain-invariant mapping, and pseudo-label mechanisms. Evaluated on standard ReID benchmarks, these frameworks show major improvements over prior UDA methods by systematically integrating augmentation, feature supervision, and robust target data exploitation (Pham et al., 4 Jan 2026).

1. IQAGA: Image Quality–Driven GAN Augmentation

IQAGA centers on a two-stage workflow combining StarGAN-based style transfer and image-quality-weighted supervised learning. In Stage I, StarGAN models are trained to convert each source image xsx_s into CtC_t styles corresponding to target camera domains, optimizing multiple objectives: adversarial (LadvL_{adv}), domain classification (LclsL_{cls}), cycle-reconstruction (LrecL_{rec}), identity mapping (LidtL_{idt}), and identity-preserving color loss (LpidL_{pid}). Each source image's translations are concatenated with the original source set to construct a synchronized training set DsyncD_{sync}.

In Stage II, ResNet-50 provides 2048-D features per image. Supervised learning optimizes cross-entropy (LceL_{ce}) and triplet (LtriL_{tri}) losses, but with per-sample IQA weighting leveraging normalized feature vector statistics (ziz_i), directly modulating each image's gradient contribution: Li=(1+Azzi)(Lce,i+Ltri,i)L_i = (1 + A_z z_i)\cdot(L_{ce,i} + L_{tri,i}). Low-quality GAN samples thus exert reduced influence, mitigating mode collapse and spurious artifacts.

Key design choices in IQAGA include avoidance of target pseudo-labeling and domain-invariant mapping: adaptation is driven purely by GAN-based augmentation and image-level loss engineering.

2. DAPRH: GAN Augmentation, Pseudo-Label Refinement, Holistic Features

DAPRH expands the GAN augmentation paradigm by combining (i) a domain-invariant mapping (DIM) adversarial feature alignment, (ii) cluster-based pseudo-labeling with refinement, (iii) holistic feature encoding via Vision Transformer (ViT), and (iv) camera-aware proxies.

Stage I uses StarGAN for style transfer as in IQAGA, but batch construction incorporates a reduced ratio (e.g., 4:1 real:GAN), curbing GAN noise. DIM employs a domain classifier DnetD_{net} to adversarially confound domain identity, training fef_e to push DnetD_{net} predictions toward 0.5 for both source and target features (LDIML_{DIM}).

In Stage II, target images are encoded, clustered by DBSCAN, and assigned hard pseudo-labels yiy_i. Features are transformed by ViT/MLP, merging global (CLS token) and local MaxPool-selected top-K tokens for expressive queries viv_i. Cluster centers mkm_k support soft label refinement GiG_i (via softmax on Euclidean distance), further filtered by silhouette coefficient sis_i. Refined labels take the weighted form y^i=(1α)yi+αGi\hat{y}_i = (1-\alpha)y_i + \alpha G_i. A teacher-student framework employs EMA to update teacher weights, with student supervision via KL divergence (LKLL_{KL}) and soft triplet (LstriL_{stri}) losses.

DAPRH integrates camera-aware proxies by subdividing clusters via camera ID, yielding sub-centers ck,bc_{k,b} and associated contrastive loss LCAPL_{CAP} over proxies per sample. The unsupervised target-objective aggregates NCE, CAP, KL, and soft-triplet losses.

3. Mathematical Formulations

IQAGA Losses

  • StarGAN Generator Loss:

LG=Ladv(G)+λclsLcls(G)+λrecLrec+λidtLidt+λpidLpidL_G = L_{adv}(G) + \lambda_{cls} L_{cls}(G) + \lambda_{rec} L_{rec} + \lambda_{idt} L_{idt} + \lambda_{pid} L_{pid}

  • Discriminator Loss:

LD=Ladv(D)+λclsLcls(D)L_D = L_{adv}(D) + \lambda_{cls} L_{cls}(D)

  • IQA-weighted Source Loss:

Lsource=i[(1+Azzi)(Lce,i+Ltri,i)]L_{source} = \sum_i \left[ (1 + A_z z_i) (L_{ce,i} + L_{tri,i}) \right]

DAPRH Losses (Stage I and II)

  • DIM Loss:

LDIM=Es[(Dnet(fs)0.5)2]+Et[(Dnet(ft)0.5)2]L_{DIM} = E_{s}[(D_{net}(f_s)-0.5)^2] + E_{t}[(D_{net}(f_t)-0.5)^2]

  • ClusterNCE:

LNCE=log(exp(qc+/τ)kexp(qck/τ))L_{NCE} = -\log\left(\frac{\exp(q \cdot c^+/\tau)}{\sum_k \exp(q \cdot c_k/\tau)}\right)

  • CAP Loss:

LCAP=1P(i)pP(i)log(exp(qcp/τc)kexp(qck/τc))L_{CAP} = -\frac{1}{|P(i)|}\sum_{p \in P(i)} \log\left(\frac{\exp(q \cdot c_p/\tau_c)}{\sum_k \exp(q \cdot c_k/\tau_c)}\right)

  • Pseudo-label refinement:

y^i=(1α)yi+αGi,Gi[k]exp(dE(fi,mk)/t)\hat{y}_i = (1-\alpha) y_i + \alpha G_i,\quad G_i[k] \propto \exp(-d_E(f_i, m_k)/t)

  • Teacher EMA update:

θtwθt+(1w)θs\theta_t \leftarrow w \theta_t + (1-w) \theta_s

4. Training Protocols and Hyperparameterization

For both methods, StarGAN is trained on source versus target domain camera labels with Adam optimizer, learning rate 3.5×1053.5\times10^{-5}, batch size $16$, and loss weights λcls=1\lambda_{cls}=1, λrec=10\lambda_{rec}=10, λidt=1\lambda_{idt}=1, λpid=10\lambda_{pid}=10. ResNet-50 is initialized from ImageNet weights, supervised with Adam optimizer at 3.5×1043.5\times10^{-4} (decayed at epochs $40,70$), batch size $128$, and $120$ epochs. Triplet margin α=0.3\alpha = 0.3, IQA weight Az=0.8A_z = 0.8.

DAPRH source batch formation reserves N:MN:M real:GAN images (e.g. $4:1$), uses λDIM=0.1\lambda_{DIM}=0.1 for adversarial loss. Target clustering employs DBSCAN (ϵ=0.6\epsilon=0.6, MinPts=$8$ for Market, $16$ for MSMT), batch $128$. Top-K local tokens K=5K=5 (approx 40% of total), α=0.4\alpha=0.4, γ=1.0\gamma=1.0, βKL=0.4\beta_{KL}=0.4, βtri=0.8\beta_{tri}=0.8. Teacher EMA momentum w=0.99w=0.99, $50$–$80$ epochs, SGD learning rate 1×1031\times10^{-3}.

5. Experimental Results and Ablation Findings

Quantitative results illustrate substantial gains in cross-domain scenarios:

Scenario Baseline mAP / Rank-1 +GAN +GAN+IQA / +DIM DAPRH Final
Market→Duke 25.8 / 43.7 31.5 / 54.2 32.1 / 55.5 72.0 / 83.7
Duke→Market 26.2 / 55.3 35.1 / 68.6 36.3 / 70.2 85.9 / 94.4
Market→MSMT 35.8 / 64.8
Duke→MSMT 36.0 / 65.5

Ablation analyses show:

  • GAN augmentation alone yields >10>10 mAP improvement; IQA weighting adds another 1\sim1 mAP in IQAGA.
  • In DAPRH, DIM is more computationally efficient than GAN for early-stage feature alignment, but integrating both is optimal.
  • Holistic (ViT) features and CAP each contribute +2+2–$4$ mAP; jointly, a further +1+1–$2$ mAP.
  • CRL and teacher-student boost +1+1–$2$ mAP, crucial for scaling to large datasets.
  • Key hyperparameters exhibit clear optima: α0.4\alpha \approx 0.4–$0.6$, γ1.0\gamma \approx 1.0, top-K 0.4\approx 0.4.

This suggests that high-fidelity augmentation, loss weighting, domain confusion, and advanced pseudo-labeling together address critical bottlenecks in fully unsupervised ReID adaptation.

6. Contributions and Comparative Significance

IQAGA demonstrates that simple GAN-based augmentation, when augmented with IQA-driven sample weighting, surpasses prior GAN-based UDA by $1$–$3$ mAP. DAPRH incorporates multi-component alignment—style transfer, adversarial mapping, refined soft pseudo-labels, holistic feature representation, camera-aware local proxies—and achieves more than $70$ mAP on Market→Duke and $85$ mAP on Duke→Market, bridging much of the practical gap to fully supervised approaches. On large-scale MSMT, DAPRH exceeds $40$ mAP, surpassing previous unsupervised adaptation results.

A plausible implication is that multi-stage integration of style transfer, discriminative feature enhancement, cluster-based label refinement, and domain-invariant mapping forms an effective paradigm for cross-domain ReID without target labels. Further, the critical role of image quality assessment and proxy learning highlights the importance of robust sample and feature selection in deep UDA pipelines, a point of emerging significance for unsupervised visual recognition research (Pham et al., 4 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to IQAGA and DAPRH.