ProjNCE: Unified Contrastive Learning
- ProjNCE is a generalized contrastive learning framework that extends InfoNCE by incorporating flexible projection functions and an adjustment term.
- It unifies self-supervised and supervised approaches, enabling robust class separation and a tighter mutual information lower bound.
- Empirical evaluations on multiple datasets and noise regimes demonstrate its superiority over SupCon and cross-entropy baselines.
ProjNCE is a generalized framework for contrastive learning that extends the classical InfoNCE objective to unify self-supervised and supervised contrastive approaches. By introducing flexible projection functions and an adjustment term, ProjNCE achieves a valid mutual information (MI) bound, enabling improved representation learning with robust class separation. This formulation accommodates diverse strategies for embedding class information and demonstrates empirical superiority over SupCon and cross-entropy baselines across various datasets, noise regimes, and evaluation criteria (Jeong et al., 11 Jun 2025).
1. Formal Definition and Mathematical Foundation
The multi-sample InfoNCE objective (for self-supervised scenarios) is traditionally:
where is a normalized encoder, is the critic, and is the temperature scaling.
ProjNCE introduces two projection functions:
which enable positives and negatives to use separate projections, yielding the generalized objective:
To ensure this variant forms a valid MI lower bound, an adjustment term is introduced:
The ProjNCE loss is thus:
This setup enables the encoder to pull representations toward positive class projections and push away from negative projections, with the adjustment term insuring that the overall loss remains a tight MI lower bound.
2. Mutual Information Bound Properties
The principal theoretical result is a multi-sample NWJ-type bound:
or equivalently,
Here, minimizing the ProjNCE loss tightens the lower bound on regardless of the specific choices of critic or projections. The proof leverages the NWJ variational MI estimator, rearranging terms to recover the generalized InfoNCE and adjustment expectations. The formulation encompasses both self-supervised and supervised scenarios and generalizes the relationship between SupCon and MI estimation.
3. Projection Function Strategies
ProjNCE’s core flexibility lies in the arbitrary choice of projection strategies. Key variants include:
- Centroid-based (SupCon-style):
This recovers the standard SupCon loss plus the term.
- Orthogonal (conditional-expectation/Soft variants):
Estimated via kernel regression (Nadaraya–Watson estimator):
- SoftNCE: (no term; )
- SoftSupCon: , (with )
- Median-based (robust):
Yielding analogous MedNCE and MedSupCon objectives.
This generalization enables tailored class embedding selection, supporting robustness to label noise and feature corruption via median and kernel strategies.
4. Experimental Evaluation and Quantitative Performance
Experiments employ a ResNet-18 encoder with , AdamW optimizer, batch sizes $256$ or $512$, and temperature . Datasets include CIFAR-10/100, Tiny-ImageNet, Imagenette, Caltech256, Food101, STL-10, and synthetic mixtures for MI estimation.
Top-1 Accuracy Across Variants
| Dataset | CE | SupCon | ProjNCE | SoftNCE | SoftSupCon |
|---|---|---|---|---|---|
| CIFAR-10 | 92.79 | 93.47 | 93.90 | 93.15 | 93.36 |
| CIFAR-100 | 64.71 | 68.89 | 69.47 | 70.44 | 68.52 |
| Tiny-ImageNet | 16.26 | 50.92 | 54.08 | 49.13 | 49.94 |
| Imagenette | 84.97 | 84.74 | 84.71 | 85.40 | 84.18 |
| Caltech256 | 75.63 | 83.18 | 81.08 | 80.94 | 80.94 |
| Food101 | 68.29 | 69.18 | 70.18 | 68.27 | 67.69 |
Robustness to Label Noise (STL-10, label-flip probability )
| Method | 0.0 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 |
|---|---|---|---|---|---|---|
| SupCon | 77.71 | 71.89 | 67.43 | 62.85 | 51.63 | 50.41 |
| ProjNCE | 79.19 | 75.41 | 70.96 | 64.14 | 55.36 | 52.21 |
| SoftNCE | 78.10 | 72.94 | 70.39 | 61.89 | 56.58 | 54.94 |
| MedSupCon | 79.04 | 75.19 | 72.70 | 66.36 | 60.78 | 57.11 |
Mutual information estimates (Mixed-KSG) corroborate that ProjNCE consistently achieves higher than SupCon.
5. Ablation Studies and Empirical Insights
Experimental ablations illuminate the influence of projection choice, adjustment-term weighting, kernel parameters, and robustness properties:
- Adjustment Term Weight (): Using , t-SNE visualizations show induces class cluster dispersion, facilitating greater false-positive separation, while can lead to excessive intra-class tightness.
- Kernel Bandwidth (): In SoftNCE, setting with distance and Epanechnikov kernel maximizes accuracy; degrades performance via oversmoothing.
- Projection Dependence: SoftNCE tightens MI bounds most for binary classification; centroid-based ProjNCE excels in multiclass contexts; median variants are most robust to feature or label noise.
- Noisy Feature Robustness: MedSupCon achieves the highest accuracy under pixel-level Gaussian noise, and integrating ProjNCE into joint-training pipelines augments performance by approximately 1 percentage point.
A plausible implication is that the flexibility in adaptation is directly responsible for the observed improvements, particularly under challenging conditions.
6. Guidelines for Practical Use
Implementation of ProjNCE requires several practical considerations:
- Batch Size: A minimum of 256 is required to stabilize both the InfoNCE and terms.
- Temperature (): Default ; tuning in can optimize results.
- Adjustment Term Weight (): Start with 1; increase for greater cluster separation, decrease if clusters are too dispersed.
- Projections:
- Centroid: In-batch averaging over class.
- Orthogonal (Soft): Kernel regression; use distance, Epanechnikov kernel, .
- Median: Compute median dimension-wise.
- Negative Sampling: In-batch negatives suffice; consider a memory bank for large datasets, maintaining class-independent sampling to preserve validity.
- Optimization: AdamW, linear learning-rate warmup, weight decay ; gradient clipping of if necessary.
- Downstream Tasks: After contrastive pre-training, freeze encoder and train a linear classifier for 50–100 epochs.
This methodology offers a unified view of contrastive objectives under valid MI bounds, with projection flexibility and adjustment-term refinement yielding consistent, broadly-applicable performance improvements (Jeong et al., 11 Jun 2025).