Papers
Topics
Authors
Recent
Search
2000 character limit reached

Global Supervised Contrastive Loss

Updated 21 May 2026
  • GSupCon is an embedding learning objective that leverages a global memory bank to overcome batch-size limitations and enhance dataset-wide discrimination.
  • It replaces local contrastive sampling with a global dictionary of positives and negatives, improving efficiency and representation quality.
  • Empirical results show that GSupCon yields significant gains in mAP and rank-1 accuracy for re-identification and fine-grained classification tasks.

Global-Supervised Contrastive Loss (GSupCon) is an embedding learning objective that extends batch-based supervised contrastive learning to enable dataset-wide (global) discrimination during representation training. GSupCon addresses the inherent locality and batch-size dependence of classical supervised contrastive (SupCon) loss by leveraging a memory bank or global dictionary of features from the entire training set. This allows each anchor to be contrasted against true positives and negatives drawn from all available training examples, enhancing discriminative power and improving generalization, especially in large-scale identification and retrieval tasks such as vehicle re-identification, person re-identification, face recognition, and fine-grained classification (Hu et al., 2022, Kim et al., 2022, Khosla et al., 2020).

1. Motivation: From Local to Global Supervised Contrastive Learning

Standard supervised contrastive loss (SupCon) pulls together representations of samples with matching labels within a minibatch and pushes apart negative pairs, but it is intrinsically local: both positive and negative sets for an anchor are restricted to the current minibatch. This locality leads to several limitations. SupCon’s effectiveness depends on batch size since the number of available negative identities—and thus the tightness of class separation—is limited by the minibatch. Empirical and theoretical evidence indicates that increasing the number of negatives in the denominator improves generalization and class separation, but naively scaling up batch size demands prohibitively more GPU memory and can hinder convergence.

GSupCon overcomes these bottlenecks by constructing a global dictionary (i.e., a memory bank) where, for each anchor, the positives and negatives can be selected from the entire training set. This enables a given anchor to be discriminatively contrasted against every other sample, enforcing global class separation without escalating memory overhead, since only the anchor feature is updated by gradients while positives and negatives in the dictionary are read-only, thus removing their gradient paths (Hu et al., 2022).

2. Formal Mathematical Formulation

Let TT denote the complete set of training images, fif_i be the anchor feature computed by the network for a sample ii, and f~a\tilde{f}_a be the (normalized) feature stored for image aa in the global dictionary DRT×dD \in \mathbb{R}^{|T| \times d}. Define the positive set for anchor ii as P~(i)={pTyp=yi}\tilde{P}(i)=\{p \in T \mid y_p = y_i\} and the negative set as N~(i)=TP~(i)\tilde{N}(i) = T \setminus \tilde{P}(i).

The GSupCon loss per batch is: LGSupCon=ibatch(1P~(i)pP~(i)logexp(fif~p/τ)aTexp(fif~a/τ))\mathcal{L}_{\text{GSupCon}} = \sum_{i \in \text{batch}} \left( -\frac{1}{|\tilde{P}(i)|} \sum_{p \in \tilde{P}(i)} \log \frac{\exp(f_i \cdot \tilde{f}_p/\tau)}{\sum_{a \in T} \exp(f_i \cdot \tilde{f}_a /\tau)} \right) where fif_i0 is the temperature hyperparameter. Only fif_i1 receives gradients; the features in the dictionary are updated using a momentum-based moving average (Hu et al., 2022).

This formulation replaces batch-limited positive and negative sets with global populations, enforcing that every anchor is simultaneously repelled from all global negatives and attracted to all global positives.

3. Training Procedure and Dictionary Management

GSupCon requires maintaining a global feature dictionary fif_i2:

  • Initialization: fif_i3 is seeded by running a few epochs with standard SupCon or by one forward pass over all training images.
  • At each iteration:
    • Compute features fif_i4 for minibatch fif_i5 of size fif_i6.
    • For each anchor fif_i7, retrieve all positives and negatives from fif_i8 and compute fif_i9 with global contrast.
    • Backpropagate; only the anchor ii0 carries gradient.
    • Update ii1 (optionally normalize), where ii2 is the momentum coefficient.

Dictionary entries are kept outside the gradient tape, minimizing memory use to ii3 for the dictionary and ii4 per-batch for backpropagation (Hu et al., 2022, Khosla et al., 2020).

4. Key Hyperparameters and Practical Considerations

The effectiveness and efficiency of GSupCon depend on several core hyperparameters:

  • Temperature ii5: Controls softmax sharpness; typical values are ii6.
  • Batch size ii7: Primarily impacts update frequency rather than negative pool size; moderate values (e.g., 64) suffice.
  • Memory bank momentum ii8: Sets the update rate for the dictionary; values in ii9 are standard.
  • Loss weight f~a\tilde{f}_a0: When combined with cross-entropy or other losses, set f~a\tilde{f}_a1 to balance gradient magnitudes.
  • Denominator efficiency: The denominator requires a sum across f~a\tilde{f}_a2 entries; practical speedups include negative subsampling (e.g., 10k–50k per update) or nearest-neighbor focus.

For large datasets (e.g., f~a\tilde{f}_a3k images with f~a\tilde{f}_a4 feature dimension), dictionary storage is tractable (≈2.4GB in float32) and may be further reduced by quantization or per-class centroids (Hu et al., 2022).

5. Empirical Evaluations and Discriminative Impact

GSupCon has demonstrated its efficacy on vehicle re-identification benchmarks of varying scale:

  • VeRi-776 (576 train IDs): SupCon + GSupCon achieves mAP of f~a\tilde{f}_a5 compared to f~a\tilde{f}_a6 for SupCon alone.
  • VehicleID (13,164 train IDs): GSupCon improves rank-1 by f~a\tilde{f}_a7–f~a\tilde{f}_a8, with combined losses yielding rank-f~a\tilde{f}_a9.
  • VERI_Wild (30,671 train IDs): GSupCon surpasses SupCon by aa0–aa1 mAP.

Visualization of embedding spaces shows markedly tighter intra-class clusters and broader inter-class separations. Retrieval rankings for difficult positive pairs (e.g., extreme viewpoints) are significantly improved under GSupCon (Hu et al., 2022). On image classification, the global variant provides aa2–aa3 absolute improvements on standard metrics over local SupCon (Khosla et al., 2020).

6. Computational Complexity and Scalability

The per-batch time complexity is naively aa4 due to the global denominator. Subsampling negatives or approximating via nearest-neighbor search is the typical remedy. The memory bank eliminates the need for backpropagation through non-anchor features, so backprop memory scales only as aa5. The approach scales gracefully to very large datasets, trading denominator computation for significant gains in representation quality and generalization, particularly when aa6 batch size (Hu et al., 2022, Khosla et al., 2020).

Parameter Typical Value/Range Role
aa7 aa8 Distribution sharpness
aa9 DRT×dD \in \mathbb{R}^{|T| \times d}0 Batch size
DRT×dD \in \mathbb{R}^{|T| \times d}1 DRT×dD \in \mathbb{R}^{|T| \times d}2 Momentum in dictionary
Dictionary size DRT×dD \in \mathbb{R}^{|T| \times d}3 All features in train set

7. Extensions and Applications Beyond Vehicle ReID

GSupCon is domain-agnostic and compatible with any architecture or embedding paradigm relying on large-scale negative sampling. Identified application domains include:

  • Person re-identification at city-scale: Gallery with millions of IDs.
  • Face recognition in unconstrained settings: Maintains a global celebrity embedding bank.
  • Fine-grained image retrieval: Birds, cars, products where inter-class variations are minimal.
  • Semi-supervised/self-supervised learning: Mix supervised GSupCon with unsupervised momentum contrast for unlabeled data.
  • Cross-modal contrastive learning: Build joint global dictionaries for image–text or other modalities.
  • Few-shot classification: Use a global memory on a base dataset, adapt using GSupCon to imprint novel classes.

The central property enabling these applications is exhaustive, dataset-wide discrimination at each update, yielding far better manifold separation and embedding robustness as dataset scale increases (Hu et al., 2022).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (3)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Global-Supervised Contrastive Loss (GSupCon).