Papers
Topics
Authors
Recent
Search
2000 character limit reached

Sub-center ArcFace: Enhanced Angular Margin

Updated 5 April 2026
  • Sub-center ArcFace is a method that assigns multiple learnable sub-centers per class to capture intra-class variability and mitigate the impact of mislabeled data.
  • It uses a training mechanism where only the dominant sub-center (with maximum cosine similarity) receives gradient updates, enabling effective noise isolation and data cleaning.
  • The approach has led to significant improvements in face and speaker verification benchmarks by refining class boundaries and enhancing model robustness.

Sub-center ArcFace is a robust extension of the ArcFace additive angular margin loss, designed to address class heterogeneity and label noise in large-scale face recognition and speaker verification tasks. Rather than associating each class with a single prototype on the hypersphere, sub-center ArcFace assigns multiple learnable sub-centers per class, enabling the model to explain intra-class variability, absorb mislabeled or noisy samples, and automatically isolate outlier distributions for downstream data cleaning and relabeling (Deng et al., 2018).

1. Mathematical Formulation

Let xiRdx_i \in \mathbb{R}^d (or ei\mathbf{e}_i) be the L2L_2-normalized feature embedding of the ii-th sample (xi2=1\|x_i\|_2=1). For each class j{1,,N}j \in \{1, \ldots, N\}, define KK normalized sub-centers Wj,1,...,Wj,KRdW_{j,1}, ..., W_{j,K} \in \mathbb{R}^d (Wj,k2=1\|W_{j,k}\|_2=1). Let s>0s > 0 be a fixed scale and ei\mathbf{e}_i0 the additive angular margin.

The sub-center ArcFace loss is given by: ei\mathbf{e}_i1 where ei\mathbf{e}_i2, and ei\mathbf{e}_i3 (Deng et al., 2018, Qin et al., 2022, Baali et al., 25 Mar 2026).

The angular margin ei\mathbf{e}_i4 is applied only to the logit corresponding to the ground-truth class's dominant sub-center. For each sample and class, the maximum cosine similarity over all sub-centers serves as the effective logit.

2. Training Mechanism and Sub-center Assignment

For each mini-batch and each class ei\mathbf{e}_i5, compute the set of inner products ei\mathbf{e}_i6 for ei\mathbf{e}_i7. For each class ei\mathbf{e}_i8, select the sub-center ei\mathbf{e}_i9 yielding the highest score: L2L_20.

  • Forward pass: Retain L2L_21 for all classes L2L_22.
  • Backward pass: Only the “winning” sub-center L2L_23 receives the gradient update for sample L2L_24; all others remain unchanged for that sample (Deng et al., 2018, Baali et al., 25 Mar 2026).
  • After convergence: For data cleaning, retain only the “dominant” sub-center (majority assigned) per class and discard samples whose angle to the dominant center exceeds a threshold (e.g., L2L_25).

This mechanism operates identically for all classes, regardless of whether they are the correct label or impostors, ensuring true sample-cluster associations drive the update.

3. Role of Dominant and Non-dominant Sub-centers in Noise Isolation

The sub-center scheme divides each class into L2L_26 clusters on the unit hypersphere. In noisy datasets, the majority of clean data for class L2L_27 forms a tight cluster around one dominant sub-center, while hard, atypical, or mislabeled samples are drawn toward non-dominant sub-centers.

After model convergence:

  • Dominant sub-center: Represents the clean, well-aligned core of each class.
  • Non-dominant sub-centers: Absorb ambiguous or mislabeled outliers, effectively separating label noise from useful data (Deng et al., 2018, Qin et al., 2022).

This separation allows automatic data purification by pruning samples distant from the dominant sub-center, and retraining on the resulting cleaned dataset yields substantial generalization improvements.

4. Geometric Interpretation on the Hypersphere

All features and sub-center weights are constrained to the unit hypersphere in L2L_28. Each class is no longer a single point, but a constellation of L2L_29 points. The intra-class angular distribution, potentially multi-modal due to pose, lighting, or noise, is modeled as a mixture of clusters.

The margin ii0 is still enforced at the angular (geodesic) level between the sample and its closest sub-center, which enhances inter-class discrimination while permitting within-class diversity (Deng et al., 2018).

5. Applications: Face Recognition, Speaker Verification, and Noisy Data Regimes

Sub-center ArcFace was initially developed for deep face recognition under massive label noise (e.g., web-scraped MS1M-V0 at 50% label noise) (Deng et al., 2018). Its utility in noisy or poorly-labeled settings has led to adoption in speaker verification, especially under semi-supervised domain adaptation schemes with clustering-derived pseudo-labels.

  • Face Recognition: Training ResNet-50 with Sub-center ArcFace (ii1) raises TPR@FPR=ii2 on IJB-C from ii3 (ArcFace) to ii4 (+ii5). Automatic cleaning with sub-center-based hard pruning and re-training pushes performance to ii6, nearly matching models trained with fully human-labeled data.
  • Speaker Verification: In domain adaptation on pseudo-labeled CN-Celeb, switching from ArcFace to Sub-center ArcFace reduced EER by approximately ii7 (from ii8 to ii9) and further improvements were achieved by combining with AS-Norm and QMF back-ends (Qin et al., 2022, Baali et al., 25 Mar 2026).
  • Curriculum Learning: Recent systems leverage the dominant sub-center cosine as a per-sample confidence score to rank and schedule training examples (easy/medium/hard) for adaptive curriculum loss weighting (Baali et al., 25 Mar 2026).

6. Implementation Details, Hyper-parameters, and Empirical Findings

<table> <thead> <tr> <th>Parameter</th> <th>Typical Value</th> <th>Significance</th> </tr> </thead> <tbody> <tr> <td>Sub-centers per class (xi2=1\|x_i\|_2=10)</td> <td\>3</td> <td>Isolates dominant and outlier modes; xi2=1\|x_i\|_2=11 usually hurts</td> </tr> <tr> <td>Scale (xi2=1\|x_i\|_2=12)</td> <td\>32 (speaker), 64 (face)</td> <td>Inherited from ArcFace for margin sharpness</td> </tr> <tr> <td>Angular margin (xi2=1\|x_i\|_2=13)</td> <td\>0.2–0.5</td> <td>Greater xi2=1\|x_i\|_2=14 strengthens decision boundaries</td> </tr> <tr> <td>Angle threshold (xi2=1\|x_i\|_2=15)</td> <td\>75° (for data cleaning)</td> <td>Robust to pruning high-confidence noise & outliers</td> </tr> </tbody> </table>

Other implementation notes:

  • Only max-pooling over sub-centers (not softmax-weighted pooling) yielded optimal results (Deng et al., 2018).
  • Second-round clustering and fine-tuning in semi-supervised settings may degrade final accuracy (Qin et al., 2022).
  • In curriculum approaches, per-sample confidence xi2=1\|x_i\|_2=16 is tracked via moving average and standard deviation; tiered weights are adaptively scheduled (Baali et al., 25 Mar 2026).

7. Limitations and Practical Recommendations

Sub-center ArcFace requires tuning of xi2=1\|x_i\|_2=17; over-fragmenting classes diminishes the model's ability to aggregate sufficient samples per prototype. Empirically, xi2=1\|x_i\|_2=18 generally suffices for most heterogeneity found in unconstrained visual and signal datasets. Combining sub-center ArcFace with strong domain adaptation or quality control back-ends (e.g., AS-Norm, QMF) is recommended in cross-domain or semi-supervised workflows. Over-iterating clustering and fine-tuning cycles can harm the learned representations; a single round is generally sufficient (Qin et al., 2022).

Sub-center ArcFace presents a modular, generalizable technique for robustifying angular margin losses under label noise, with demonstrated gains across face and speaker recognition benchmarks (Deng et al., 2018, Baali et al., 25 Mar 2026, Qin et al., 2022).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Sub-center ArcFace.