SINCERE: Supervised InfoNCE Extensions

Updated 26 December 2025

The paper introduces SINCERE as a probabilistic NCE-based loss that eliminates intra-class repulsion by always including positives in the numerator.
It presents ProjNCE, a method that employs parametric projection functions to unify supervised and self-supervised learning under a mutual information bound.
Empirical evaluations demonstrate improved transfer learning performance and label noise robustness, with practical recommendations on batch size, temperature, and projection choices.

Supervised InfoNCE Extensions (SINCERE) encompass a family of theoretically motivated modifications of the InfoNCE and supervised contrastive losses, developed to address shortcomings of prior approaches—most notably, intra-class repulsion in SupCon—and to provide robust, information-theoretic guarantees for supervised representation learning. This article details the motivations, formal definitions, theoretical properties, empirical findings, and practical recommendations of SINCERE and related methodologies, situating them within the broader context of robust and flexible supervised contrastive learning.

1. Theoretical Background and Motivation

The InfoNCE loss is foundational in self-supervised learning, encouraging the alignment of augmented views of the same instance (positives) while separating all other samples (negatives) in the representation space. With the introduction of class labels, SupCon extends InfoNCE to treat all same-class samples as positives. However, the design of SupCon can result in intra-class repulsion; each time a same-class positive appears only in the denominator of an InfoNCE-type term (but not the numerator), it is effectively treated as a negative, causing repulsion among same-class embeddings, especially as the number of positives in a batch grows (Feeney et al., 2023).

This motivates the search for extensions of InfoNCE that maintain the theoretical properties of the original loss but properly utilize label information. SINCERE (Supervised InfoNCE REvisited) and other recent approaches formalize supervised contrastive learning as a probabilistic noise-contrastive estimation (NCE) problem, thus restoring the principled discrimination between positive and negative pairs. In parallel, frameworks such as ProjNCE generalize the projection/critic structure of supervised contrastive losses to unify supervised and self-supervised objectives under a mutual information (MI) bound (Jeong et al., 11 Jun 2025).

2. Formal Definitions: SINCERE and ProjNCE

SINCERE Loss

SINCERE reformulates the supervised noise-contrastive setting by drawing a single anchor from the set of positives (i.e., all batch samples of a given class), while the remaining class members act as known positives and all other samples are negatives. The SINCERE loss for an anchor $z_S$ and positive $z_p$ is: $L_{\rm SINCERE}(z_S, z_p) = -\log \frac{\exp(z_S \cdot z_p/\tau)}{\exp(z_S \cdot z_p/\tau) + \sum_{n \in \mathcal N} \exp(z_n \cdot z_p/\tau)}$ where $\mathcal N$ is the set of negatives (samples not of the anchor's class) and $\tau$ is a temperature parameter. Crucially, every positive $p$ that appears in the denominator is always included in the numerator, thereby eliminating intra-class repulsion (Feeney et al., 2023).

ProjNCE

ProjNCE generalizes both InfoNCE and SupCon by introducing parametric projection mappings $g_+, g_-$ for class embeddings and an optional adjustment term for negative pairs: $I_{\rm NCE}^{\rm proj} = I_{\rm NCE}^{\rm self\text{-}p} + R$ where $I_{\rm NCE}^{\rm self\text{-}p}$ follows: $I_{\rm NCE}^{\rm self\text{-}p} = \frac{1}{N} \sum_{i=1}^N \left[ -\log \frac{ \exp(\psi(f(x_i), g_+(c_i))) }{ \sum_{j=1}^N \exp(\psi(f(x_i), g_-(c_j))) } \right]$ and $R$ is an expectation-based adjustment term over negative pairings. The critic $\psi$ is typically a scaled cosine similarity. Different choices of $g_+, g_-$ recover standard supervised (centroid-based SupCon), orthogonal projection (mean-based), kernel-smoothed, or median-based robust alignments (Jeong et al., 11 Jun 2025).

3. Information-Theoretic Guarantees and Robustness

Information-Theoretic Bound (SINCERE)

The expected SINCERE loss lower-bounds the symmetrized Kullback-Leibler divergence between class and noise distributions: $L(\theta) \geq \log |\mathcal N| - [ D_{\rm KL}(p^- \| p^+) + D_{\rm KL}(p^+ \| p^-) ]$ where $p^+$ and $p^-$ denote the class and noise distributions, respectively. This relates SINCERE’s minimization directly to maximizing distributional separability in embedding space (Feeney et al., 2023).

Mutual Information Bound (ProjNCE)

ProjNCE is constructed so that $-I_{\rm NCE}^{\rm proj}$ is an explicit lower bound on $I(X;C)$ up to a batch-size dependent additive constant, thereby providing a mutual information maximization objective at the class level, not just at the instance level (Jeong et al., 11 Jun 2025).

Robustness to Label Noise

A unified theoretical framework shows that standard InfoNCE and SupCon are not robust to symmetric label noise since their risk decompositions have function-dependent bias terms. Losses satisfying $\Delta\mathcal R = {\rm const}$ (i.e., the “noise-bias” independence condition) are robust. Symmetric InfoNCE (SymNCE), created by averaging InfoNCE and a reverse-order variant (RevNCE), and RINCE (Ranking InfoNCE) both satisfy this noise-tolerance criterion and outperform standard losses under severe label corruption (Cui et al., 2 Jan 2025).

Loss Name	Addressed Setting	Key Mechanism
SINCERE	Standard supervised	Probabilistic NCE that always includes all positives
ProjNCE	Flexible supervised/self	Projection-based embedding, mutual information bound
SymNCE	Label noise robust	Adds InfoNCE and RevNCE, ensuring constant noise bias
RINCE	Noisy/graded positives	InfoNCE loss over ranked positive tiers

ProjNCE encompasses centroid, mean, median, or kernel-based projections as special cases, with the adjustment term $R$ ensuring the MI bound applies. SymNCE provably cancels the function-dependent noise-bias term, yielding robustness to label noise under the symmetric noise model (Cui et al., 2 Jan 2025). RINCE enables multi-tiered positive supervision, e.g., using superclasses or continuous similarity, and is beneficial where the binary positive/negative split is ill-defined (Hoffmann et al., 2022).

5. Empirical Evaluation and Observed Benefits

Empirical findings across CIFAR-10, CIFAR-100, ImageNet-100, and noisy-label benchmarks demonstrate:

SINCERE eliminates intra-class repulsion, yields tighter intra-class clusters, higher margin between classes in cosine similarity, and substantially improves transfer learning performance, especially for linear probing on Cars-196 and FGVC-Aircraft, with minimal change to standard classification accuracy (Feeney et al., 2023).
ProjNCE provides consistent improvements over SupCon and cross-entropy (e.g., 0.3–0.7% top-1 accuracy on CIFAR-10/100, tiny-ImageNet) and produces representations with higher estimated mutual information (Jeong et al., 11 Jun 2025).
SymNCE outperforms SupCon and RINCE under both synthetic and real-world label noise (e.g., Clothing1M dataset, CIFAR-10/CIFAR-100 at high corruption rates), achieving linear-probe accuracy gains from 2–5% (Cui et al., 2 Jan 2025).
RINCE demonstrates benefits for hierarchical and graded similarity (e.g., superclasses), out-of-distribution detection, and video representation learning, yielding measurable accuracy and retrieval improvements over standard InfoNCE (Hoffmann et al., 2022).

6. Practical Recommendations

Batch Size: Larger batches (e.g., $N \geq 256$ or $512$) improve the tightness of information-theoretic bounds, especially for adjustment or kernel-based projection methods (Jeong et al., 11 Jun 2025).
Temperature: Typical values include $\tau=0.07$ (ProjNCE), $\tau=0.1$ (SINCERE); adjust per dataset and stability (Feeney et al., 2023, Jeong et al., 11 Jun 2025).
Projection Function Choice:
- Centroid (SupCon): Suitable for clean, balanced data.
- Mean (SoftSupCon), kernel-based: Improved robustness under label noise or class imbalance.
- Median (MedSupCon): Robust to outliers and extreme noise (Jeong et al., 11 Jun 2025).
Adjustment Term Weighting: In very noisy conditions, scale the adjustment term $R$ by $\beta>1$ (e.g., $\beta \in [5, 10]$ ), trading off intra-class tightness for improved class separation (Jeong et al., 11 Jun 2025).
Computational Complexity: SINCERE, ProjNCE, and SupCon all have similar computational profiles— $O(N^2 D)$ for pairwise dot-product computations per batch (Feeney et al., 2023).
Integration: SINCERE and ProjNCE act as drop-in replacements for SupCon in existing pipelines; RINCE and SymNCE may require ranked positive construction or additional compositional logic (Hoffmann et al., 2022, Cui et al., 2 Jan 2025).

7. Future Directions and Open Questions

Several lines of investigation are proposed in the literature:

Extending SINCERE to directly predict sets of all positives versus pairwise terms, potentially allowing higher-order modeling of class distributions at the expense of computational complexity (Feeney et al., 2023).
Adapting these frameworks for multi-modal, multi-view, or multi-modal contrastive learning.
Developing further robust loss functions for structured or asymmetric label noise, and establishing tighter information-theoretic bounds incorporating finite mini-batch approximations (Cui et al., 2 Jan 2025).
Generalizing projection functions in ProjNCE to more sophisticated kernel, graph, or optimal transport embeddings, aligning with domain-specific geometry or class structure (Jeong et al., 11 Jun 2025).
Incorporating margin-based or class-conditional weighting into SINCERE/ProjNCE for long-tailed or multi-modal data.

SINCERE and its related supervised InfoNCE extensions represent a rigorous convergence of geometric, probabilistic, and information-theoretic views on supervised representation learning, providing flexible, robust, and theoretically grounded objectives for modern contrastive pipelines (Feeney et al., 2023, Jeong et al., 11 Jun 2025, Cui et al., 2 Jan 2025).