Supervised Contrastive Learning (SCL-1)
- Supervised Contrastive Learning (SCL-1) is a graph-based method that leverages labeled data to construct positive and negative graphs for discriminative feature extraction.
- It optimizes a log-softmax contrastive loss to pull same-class samples together and push different-class samples apart using gradient-based methods.
- Empirical evaluations show SCL-1 achieves 3–7% accuracy improvements over traditional graph-based methods while matching deep self-supervised approaches.
Supervised Contrastive Learning (SCL-1) is a graph-based feature extraction method that leverages label information to construct discriminative low-dimensional representations. Within the unified framework of contrastive learning, SCL-1 uniquely formulates supervised contrastive objectives in terms of class-dependent positive and negative graphs, offering a direct and efficient approach for supervised dimensionality reduction and representation learning (Zhang, 2021).
1. Theoretical Formulation and Graph Construction
SCL-1 begins with a labeled dataset and class labels . The method constructs two binary adjacency matrices:
- : if , 0 otherwise; this defines the “positive” graph connecting all pairs within the same class.
- : if , 0 otherwise; the “negative” graph connects all cross-class pairs.
These graphs strictly encode supervision, ensuring the contrastive optimization is anchored in true class membership.
2. Contrastive Loss Objective
The loss in SCL-1 is defined to simultaneously maximize the similarity between projections of same-class samples (positives) and minimize the similarity between projections of different-class samples (negatives):
where is the linear projection matrix, , and
with the temperature parameter controlling the scale of similarity.
This loss encourages the mapping of intra-class samples to proximate locations in the embedding space, while ensuring that inter-class samples are maximally separated.
3. Optimization and Algorithmic Procedure
Minimization of is performed via first-order gradient-based optimization (Adam). The overall computational procedure is:
- Initialize (random or PCA pre-initialized).
- Compute from labels.
- Iteratively update using Adam, with standard hyperparameters:
- Learning rate
- , ,
- Stop when
- Output optimized as the feature extractor.
Normalization and temperature tuning are essential; is typically selected from .
4. Role of Label Supervision
The only use of supervision in SCL-1 is in assignment to and . This strictly enforces class-aware contrast: all class-matched pairs are pulled together and all non-class pairs are contrasted away. No neighborhood heuristics or class-proximity scalings are used; the objective is fully determined by graph structure generated from labels.
5. Experimental Evaluation and Comparative Performance
SCL-1 has been compared against linear and deep (self-supervised) baselines:
| Method | Feature Type | Key Setting | Accuracy Improvement |
|---|---|---|---|
| LPP, FLPP | Graph-based | Classic sparse connectivities | Baseline |
| LDA, LFDA | Supervised | Covariance optimization | Baseline |
| SimCLR | Deep, no labels | SSL, augmentations | Comparable |
| SCL-1 | Supervised | Label graph, contrastive loss | +3–7% over graph-based; matches or betters SimCLR |
Empirical evaluations on Multiple Features, Yale, COIL20, MNIST, and USPS datasets demonstrate that SCL-1 robustly outperforms graph-based approaches (LPP variants) in classification accuracy, and achieves parity with deep self-supervised systems, despite being a shallow projector.
6. Implementation Notes and Hyperparameters
- Data preprocessing: For images, PCA to 100 dimensions followed by standardization; no PCA for tabular features.
- Classifier: 1-Nearest Neighbor in the learned embedding space.
- Random train/test splits (5 repeats); accuracy and recall as primary metrics.
- Convergence is typically rapid due to the closed-form structure of the pair graph and the log-softmax loss architecture.
7. Significance, Scope, and Limitations
SCL-1 exemplifies the intersection of classical graph-based learning and modern supervised contrastive paradigms. By reducing the feature extraction to supervised graph construction and a single log-softmax contrastive loss, SCL-1 provides an interpretable, efficient, and shallow alternative to both kernel methods and deep SSL. Performance is robust and closely tracks or supersedes deep contrastive methods on real datasets—despite requiring no augmentations, memory banks, or architectural modifications (Zhang, 2021).
A plausible implication is that the expressivity of contrastive supervision, even in linear settings, is sufficient to recover highly discriminative embeddings given fully-labeled data. SCL-1's elegant and minimal design makes it suitable for direct deployment in classical machine learning as well as for initializing downstream deep learning modules.
In sum, SCL-1 stands as an effective and theoretically grounded approach for supervised representation learning, leveraging pairwise label constraints in the context of contrastive optimization and yielding state-of-the-art results in both interpretability and empirical performance.