ProjNCE: Unified Contrastive Learning

Updated 16 January 2026

ProjNCE is a generalized contrastive learning framework that extends InfoNCE by incorporating flexible projection functions and an adjustment term.
It unifies self-supervised and supervised approaches, enabling robust class separation and a tighter mutual information lower bound.
Empirical evaluations on multiple datasets and noise regimes demonstrate its superiority over SupCon and cross-entropy baselines.

ProjNCE is a generalized framework for contrastive learning that extends the classical InfoNCE objective to unify self-supervised and supervised contrastive approaches. By introducing flexible projection functions and an adjustment term, ProjNCE achieves a valid mutual information (MI) bound, enabling improved representation learning with robust class separation. This formulation accommodates diverse strategies for embedding class information and demonstrates empirical superiority over SupCon and cross-entropy baselines across various datasets, noise regimes, and evaluation criteria (Jeong et al., 11 Jun 2025).

1. Formal Definition and Mathematical Foundation

The multi-sample InfoNCE objective (for self-supervised scenarios) is traditionally:

$I_{\mathrm{NCE}^{\rm self}}(X;C) = \frac{1}{N}\sum_{i=1}^N \mathbb{E}_{p(x_i|c_i)\prod_{j\neq i}p(x_j)} \left[ -\log \frac{\exp\left(\psi(f(x_i), f(x_i))/\tau\right)}{\sum_{j=1}^N \exp\left(\psi(f(x_i), f(x_j))/\tau\right)} \right]$

where $f(\cdot)$ is a normalized encoder, $\psi(u,v) = u \cdot v$ is the critic, and $\tau$ is the temperature scaling.

ProjNCE introduces two projection functions:

$g_+: \{1, \ldots, M\} \rightarrow \mathbb{R}^{d_z}, \quad g_-: \{1, \ldots, M\} \rightarrow \mathbb{R}^{d_z}$

which enable positives and negatives to use separate projections, yielding the generalized objective:

$I_{\mathrm{NCE}^{\rm self\text{-}p}}(X;C) = \frac{1}{N}\sum_{i=1}^N \mathbb{E}_{p(x_i|c_i)\prod_{j\neq i}p(x_j)} \left[ -\log \frac{\exp\left(\psi(f(x_i), g_+(c_i))\right)}{\sum_{j=1}^N \exp\left(\psi(f(x_i), g_-(c_j))\right)} \right]$

To ensure this variant forms a valid MI lower bound, an adjustment term is introduced:

$R(X,C) = \mathbb{E}_{p(x)\prod_{j=1}^N p(x_j)} \left[ \frac{\sum_{k=1}^N \exp(\psi(f(x), g_+(c_k)))}{\sum_{k=1}^N \exp(\psi(f(x), g_-(c_k)))} \right]$

The ProjNCE loss is thus:

$\mathcal{L}_{\mathrm{ProjNCE}}(X;C) = I_{\mathrm{NCE}^{\rm self\text{-}p}}(X;C) + R(X,C)$

This setup enables the encoder to pull representations toward positive class projections and push away from negative projections, with the adjustment term insuring that the overall loss remains a tight MI lower bound.

2. Mutual Information Bound Properties

The principal theoretical result is a multi-sample NWJ-type bound:

$I(X;C) \ge 1 + \log N - I_{\mathrm{NCE}^{\rm self\text{-}p}}(X;C) - R(X,C)$

or equivalently,

$-\mathcal{L}_{\mathrm{ProjNCE}} \le I(X;C) - (1 + \log N)$

Here, minimizing the ProjNCE loss tightens the lower bound on $I(X;C)$ regardless of the specific choices of critic or projections. The proof leverages the NWJ variational MI estimator, rearranging terms to recover the generalized InfoNCE and adjustment expectations. The formulation encompasses both self-supervised and supervised scenarios and generalizes the relationship between SupCon and MI estimation.

3. Projection Function Strategies

ProjNCE’s core flexibility lies in the arbitrary choice of $(g_+, g_-)$ projection strategies. Key variants include:

Centroid-based (SupCon-style):

$g_+(c) = \frac{1}{|P(c)|} \sum_{x_j: c_j = c} f(x_j), \quad g_-(c) = f(x)$

This recovers the standard SupCon loss plus the $R$ term.

Orthogonal (conditional-expectation/Soft variants):

$\bar{f}(c) = \mathbb{E}[f(X)|C=c]$

Estimated via kernel regression (Nadaraya–Watson estimator):

$\hat{f}(c) = \frac{\sum_{j=1}^N K_h(d(f(x_j), \cdot))\mathbf{1}_{\{c_j=c\}}f(x_j)}{\sum_{j=1}^N K_h(d(f(x_j),\cdot))\mathbf{1}_{\{c_j=c\}}}$

SoftNCE: $g_+ = g_- = \bar{f}$ (no $R$ term; $R=1$ )
SoftSupCon: $g_+ = \bar{f}$ $g_{+} = \overset{ˉ}{f}$ , $g_- = f$ $g_{-} = f$ (with $R$ $R$ )
- Median-based (robust):

$f_{\text{med}}(c) = \mathrm{median}\{f(x_j): c_j=c\}$

Yielding analogous MedNCE and MedSupCon objectives.

This generalization enables tailored class embedding selection, supporting robustness to label noise and feature corruption via median and kernel strategies.

4. Experimental Evaluation and Quantitative Performance

Experiments employ a ResNet-18 encoder with $d_z=128$ , AdamW optimizer, batch sizes $256$ or $512$, and temperature $\tau=0.07$ . Datasets include CIFAR-10/100, Tiny-ImageNet, Imagenette, Caltech256, Food101, STL-10, and synthetic mixtures for MI estimation.

Top-1 Accuracy Across Variants

Dataset	CE	SupCon	ProjNCE	SoftNCE	SoftSupCon
CIFAR-10	92.79	93.47	93.90	93.15	93.36
CIFAR-100	64.71	68.89	69.47	70.44	68.52
Tiny-ImageNet	16.26	50.92	54.08	49.13	49.94
Imagenette	84.97	84.74	84.71	85.40	84.18
Caltech256	75.63	83.18	81.08	80.94	80.94
Food101	68.29	69.18	70.18	68.27	67.69

Robustness to Label Noise (STL-10, label-flip probability $p$ )

Method	0.0	0.1	0.2	0.3	0.4	0.5
SupCon	77.71	71.89	67.43	62.85	51.63	50.41
ProjNCE	79.19	75.41	70.96	64.14	55.36	52.21
SoftNCE	78.10	72.94	70.39	61.89	56.58	54.94
MedSupCon	79.04	75.19	72.70	66.36	60.78	57.11

Mutual information estimates (Mixed-KSG) corroborate that ProjNCE consistently achieves higher $I(f(X);C)$ than SupCon.

5. Ablation Studies and Empirical Insights

Experimental ablations illuminate the influence of projection choice, adjustment-term weighting, kernel parameters, and robustness properties:

Adjustment Term Weight ( $\beta$ ): Using $\mathcal{L}_\beta = I_{\rm NCE}^{\rm self\text{-}p} + \beta R$ , t-SNE visualizations show $\beta=5$ induces class cluster dispersion, facilitating greater false-positive separation, while $\beta=10$ can lead to excessive intra-class tightness.
Kernel Bandwidth ( $h$ ): In SoftNCE, setting $h=0.6$ with $\ell_1$ distance and Epanechnikov kernel maximizes accuracy; $h>0.8$ degrades performance via oversmoothing.
Projection Dependence: SoftNCE tightens MI bounds most for binary classification; centroid-based ProjNCE excels in multiclass contexts; median variants are most robust to feature or label noise.
Noisy Feature Robustness: MedSupCon achieves the highest accuracy under pixel-level Gaussian noise, and integrating ProjNCE into joint-training pipelines augments performance by approximately 1 percentage point.

A plausible implication is that the flexibility in $(g_+, g_-)$ adaptation is directly responsible for the observed improvements, particularly under challenging conditions.

6. Guidelines for Practical Use

Implementation of ProjNCE requires several practical considerations:

Batch Size: A minimum of 256 is required to stabilize both the InfoNCE and $R$ terms.
Temperature ( $\tau$ ): Default $\tau=0.07$ ; tuning in $[0.05, 0.2]$ can optimize results.
Adjustment Term Weight ( $\beta$ ): Start with 1; increase for greater cluster separation, decrease if clusters are too dispersed.
Projections:
- Centroid: In-batch averaging over class.
- Orthogonal (Soft): Kernel regression; use $\ell_1$ distance, Epanechnikov kernel, $h \in [0.4, 0.8]$ .
- Median: Compute median dimension-wise.
Negative Sampling: In-batch negatives suffice; consider a memory bank for large datasets, maintaining class-independent sampling to preserve $R$ validity.
Optimization: AdamW, linear learning-rate warmup, weight decay $1\mathrm{e}{-4}$ ; gradient clipping of $R$ if necessary.
Downstream Tasks: After contrastive pre-training, freeze encoder and train a linear classifier for 50–100 epochs.

This methodology offers a unified view of contrastive objectives under valid MI bounds, with projection flexibility and adjustment-term refinement yielding consistent, broadly-applicable performance improvements (Jeong et al., 11 Jun 2025).

Markdown Upgrade to Chat

References (1)

Generalizing Supervised Contrastive learning: A Projection Perspective (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ProjNCE.

ProjNCE: Unified Contrastive Learning

1. Formal Definition and Mathematical Foundation

2. Mutual Information Bound Properties

3. Projection Function Strategies

4. Experimental Evaluation and Quantitative Performance

Top-1 Accuracy Across Variants

Robustness to Label Noise (STL-10, label-flip probability $p$ )

5. Ablation Studies and Empirical Insights

6. Guidelines for Practical Use

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

ProjNCE: Unified Contrastive Learning

1. Formal Definition and Mathematical Foundation

2. Mutual Information Bound Properties

3. Projection Function Strategies

4. Experimental Evaluation and Quantitative Performance

Top-1 Accuracy Across Variants

Robustness to Label Noise (STL-10, label-flip probability ppp)

5. Ablation Studies and Empirical Insights

6. Guidelines for Practical Use

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Robustness to Label Noise (STL-10, label-flip probability $p$ )