Supervised Contrastive Learning (SCL-1)

Updated 25 November 2025

Supervised Contrastive Learning (SCL-1) is a graph-based method that leverages labeled data to construct positive and negative graphs for discriminative feature extraction.
It optimizes a log-softmax contrastive loss to pull same-class samples together and push different-class samples apart using gradient-based methods.
Empirical evaluations show SCL-1 achieves 3–7% accuracy improvements over traditional graph-based methods while matching deep self-supervised approaches.

Supervised Contrastive Learning (SCL-1) is a graph-based feature extraction method that leverages label information to construct discriminative low-dimensional representations. Within the unified framework of contrastive learning, SCL-1 uniquely formulates supervised contrastive objectives in terms of class-dependent positive and negative graphs, offering a direct and efficient approach for supervised dimensionality reduction and representation learning (Zhang, 2021).

1. Theoretical Formulation and Graph Construction

SCL-1 begins with a labeled dataset $X = [x_1, x_2, ..., x_n] \in \mathbb{R}^{D \times n}$ and class labels $\{c_i\}_{i=1}^n$ . The method constructs two binary adjacency matrices:

$S^{\text{pos}}$ : $S^{\text{pos}}_{ij} = 1$ if $c_i = c_j$ , 0 otherwise; this defines the “positive” graph connecting all pairs within the same class.
$S^{\text{neg}}$ : $S^{\text{neg}}_{ij} = 1$ if $c_i \neq c_j$ , 0 otherwise; the “negative” graph connects all cross-class pairs.

These graphs strictly encode supervision, ensuring the contrastive optimization is anchored in true class membership.

2. Contrastive Loss Objective

The loss in SCL-1 is defined to simultaneously maximize the similarity between projections of same-class samples (positives) and minimize the similarity between projections of different-class samples (negatives):

$L(P) = \sum_{i=1}^n -\log \frac{\sum_{j=1}^n S^{\text{pos}}_{ij} \exp(\text{SIM}(P^\top x_i, P^\top x_j))} {\sum_{j=1}^n S^{\text{who}}_{ij} \exp(\text{SIM}(P^\top x_i, P^\top x_j))}$

where $P \in \mathbb{R}^{D \times d}$ is the linear projection matrix, $S^{\text{who}} = S^{\text{pos}} + S^{\text{neg}}$ , and

$\text{SIM}(P^\top x_i, P^\top x_j) = \frac{(P^\top x_i)^\top (P^\top x_j)}{\|P^\top x_i\| \, \|P^\top x_j\| \, \sigma}$

with the temperature parameter $\sigma > 0$ controlling the scale of similarity.

This loss encourages the mapping of intra-class samples to proximate locations in the embedding space, while ensuring that inter-class samples are maximally separated.

3. Optimization and Algorithmic Procedure

Minimization of $L(P)$ is performed via first-order gradient-based optimization (Adam). The overall computational procedure is:

Initialize $P_0$ (random or PCA pre-initialized).
Compute $S^{\text{pos}}, S^{\text{neg}}, S^{\text{who}}$ from labels.
Iteratively update $P$ $P$ using Adam, with standard hyperparameters:
- Learning rate $\alpha = 10^{-3}$
- $\beta_1 = 0.9$ , $\beta_2 = 0.999$ , $\epsilon = 10^{-8}$
- Stop when $|L(P_t) - L(P_{t-1})| < 10^{-3}$
Output optimized $P$ as the feature extractor.

Normalization and temperature tuning are essential; $\sigma$ is typically selected from $\{0.01, 0.1, 1, 10, 100, 1000\}$ .

4. Role of Label Supervision

The only use of supervision in SCL-1 is in assignment to $S^{\text{pos}}$ and $S^{\text{neg}}$ . This strictly enforces class-aware contrast: all class-matched pairs are pulled together and all non-class pairs are contrasted away. No neighborhood heuristics or class-proximity scalings are used; the objective is fully determined by graph structure generated from labels.

5. Experimental Evaluation and Comparative Performance

SCL-1 has been compared against linear and deep (self-supervised) baselines:

Method	Feature Type	Key Setting	Accuracy Improvement
LPP, FLPP	Graph-based	Classic sparse connectivities	Baseline
LDA, LFDA	Supervised	Covariance optimization	Baseline
SimCLR	Deep, no labels	SSL, augmentations	Comparable
SCL-1	Supervised	Label graph, contrastive loss	+3–7% over graph-based; matches or betters SimCLR

Empirical evaluations on Multiple Features, Yale, COIL20, MNIST, and USPS datasets demonstrate that SCL-1 robustly outperforms graph-based approaches (LPP variants) in classification accuracy, and achieves parity with deep self-supervised systems, despite being a shallow projector.

6. Implementation Notes and Hyperparameters

Data preprocessing: For images, PCA to 100 dimensions followed by standardization; no PCA for tabular features.
Classifier: 1-Nearest Neighbor in the learned embedding space.
Random train/test splits (5 repeats); accuracy and recall as primary metrics.
Convergence is typically rapid due to the closed-form structure of the pair graph and the log-softmax loss architecture.

7. Significance, Scope, and Limitations

SCL-1 exemplifies the intersection of classical graph-based learning and modern supervised contrastive paradigms. By reducing the feature extraction to supervised graph construction and a single log-softmax contrastive loss, SCL-1 provides an interpretable, efficient, and shallow alternative to both kernel methods and deep SSL. Performance is robust and closely tracks or supersedes deep contrastive methods on real datasets—despite requiring no augmentations, memory banks, or architectural modifications (Zhang, 2021).

A plausible implication is that the expressivity of contrastive supervision, even in linear settings, is sufficient to recover highly discriminative embeddings given fully-labeled data. SCL-1's elegant and minimal design makes it suitable for direct deployment in classical machine learning as well as for initializing downstream deep learning modules.

In sum, SCL-1 stands as an effective and theoretically grounded approach for supervised representation learning, leveraging pairwise label constraints in the context of contrastive optimization and yielding state-of-the-art results in both interpretability and empirical performance.

PDF Markdown Chat (Pro)

References (1)

Unified Framework for Feature Extraction based on Contrastive Learning (2021)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Supervised Contrastive Learning (SCL-1).

Supervised Contrastive Learning (SCL-1)

1. Theoretical Formulation and Graph Construction

2. Contrastive Loss Objective

3. Optimization and Algorithmic Procedure

4. Role of Label Supervision

5. Experimental Evaluation and Comparative Performance

6. Implementation Notes and Hyperparameters

7. Significance, Scope, and Limitations

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Supervised Contrastive Learning (SCL-1)

1. Theoretical Formulation and Graph Construction

2. Contrastive Loss Objective

3. Optimization and Algorithmic Procedure

4. Role of Label Supervision

5. Experimental Evaluation and Comparative Performance

6. Implementation Notes and Hyperparameters

7. Significance, Scope, and Limitations

Sponsor

Whiteboard

Topic to Video (Beta)

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research