Papers
Topics
Authors
Recent
2000 character limit reached

Supervised Contrastive Learning (SCL-1)

Updated 25 November 2025
  • Supervised Contrastive Learning (SCL-1) is a graph-based method that leverages labeled data to construct positive and negative graphs for discriminative feature extraction.
  • It optimizes a log-softmax contrastive loss to pull same-class samples together and push different-class samples apart using gradient-based methods.
  • Empirical evaluations show SCL-1 achieves 3–7% accuracy improvements over traditional graph-based methods while matching deep self-supervised approaches.

Supervised Contrastive Learning (SCL-1) is a graph-based feature extraction method that leverages label information to construct discriminative low-dimensional representations. Within the unified framework of contrastive learning, SCL-1 uniquely formulates supervised contrastive objectives in terms of class-dependent positive and negative graphs, offering a direct and efficient approach for supervised dimensionality reduction and representation learning (Zhang, 2021).

1. Theoretical Formulation and Graph Construction

SCL-1 begins with a labeled dataset X=[x1,x2,...,xn]RD×nX = [x_1, x_2, ..., x_n] \in \mathbb{R}^{D \times n} and class labels {ci}i=1n\{c_i\}_{i=1}^n. The method constructs two binary adjacency matrices:

  • SposS^{\text{pos}}: Sijpos=1S^{\text{pos}}_{ij} = 1 if ci=cjc_i = c_j, 0 otherwise; this defines the “positive” graph connecting all pairs within the same class.
  • SnegS^{\text{neg}}: Sijneg=1S^{\text{neg}}_{ij} = 1 if cicjc_i \neq c_j, 0 otherwise; the “negative” graph connects all cross-class pairs.

These graphs strictly encode supervision, ensuring the contrastive optimization is anchored in true class membership.

2. Contrastive Loss Objective

The loss in SCL-1 is defined to simultaneously maximize the similarity between projections of same-class samples (positives) and minimize the similarity between projections of different-class samples (negatives):

L(P)=i=1nlogj=1nSijposexp(SIM(Pxi,Pxj))j=1nSijwhoexp(SIM(Pxi,Pxj))L(P) = \sum_{i=1}^n -\log \frac{\sum_{j=1}^n S^{\text{pos}}_{ij} \exp(\text{SIM}(P^\top x_i, P^\top x_j))} {\sum_{j=1}^n S^{\text{who}}_{ij} \exp(\text{SIM}(P^\top x_i, P^\top x_j))}

where PRD×dP \in \mathbb{R}^{D \times d} is the linear projection matrix, Swho=Spos+SnegS^{\text{who}} = S^{\text{pos}} + S^{\text{neg}}, and

SIM(Pxi,Pxj)=(Pxi)(Pxj)PxiPxjσ\text{SIM}(P^\top x_i, P^\top x_j) = \frac{(P^\top x_i)^\top (P^\top x_j)}{\|P^\top x_i\| \, \|P^\top x_j\| \, \sigma}

with the temperature parameter σ>0\sigma > 0 controlling the scale of similarity.

This loss encourages the mapping of intra-class samples to proximate locations in the embedding space, while ensuring that inter-class samples are maximally separated.

3. Optimization and Algorithmic Procedure

Minimization of L(P)L(P) is performed via first-order gradient-based optimization (Adam). The overall computational procedure is:

  1. Initialize P0P_0 (random or PCA pre-initialized).
  2. Compute Spos,Sneg,SwhoS^{\text{pos}}, S^{\text{neg}}, S^{\text{who}} from labels.
  3. Iteratively update PP using Adam, with standard hyperparameters:
    • Learning rate α=103\alpha = 10^{-3}
    • β1=0.9\beta_1 = 0.9, β2=0.999\beta_2 = 0.999, ϵ=108\epsilon = 10^{-8}
    • Stop when L(Pt)L(Pt1)<103|L(P_t) - L(P_{t-1})| < 10^{-3}
  4. Output optimized PP as the feature extractor.

Normalization and temperature tuning are essential; σ\sigma is typically selected from {0.01,0.1,1,10,100,1000}\{0.01, 0.1, 1, 10, 100, 1000\}.

4. Role of Label Supervision

The only use of supervision in SCL-1 is in assignment to SposS^{\text{pos}} and SnegS^{\text{neg}}. This strictly enforces class-aware contrast: all class-matched pairs are pulled together and all non-class pairs are contrasted away. No neighborhood heuristics or class-proximity scalings are used; the objective is fully determined by graph structure generated from labels.

5. Experimental Evaluation and Comparative Performance

SCL-1 has been compared against linear and deep (self-supervised) baselines:

Method Feature Type Key Setting Accuracy Improvement
LPP, FLPP Graph-based Classic sparse connectivities Baseline
LDA, LFDA Supervised Covariance optimization Baseline
SimCLR Deep, no labels SSL, augmentations Comparable
SCL-1 Supervised Label graph, contrastive loss +3–7% over graph-based; matches or betters SimCLR

Empirical evaluations on Multiple Features, Yale, COIL20, MNIST, and USPS datasets demonstrate that SCL-1 robustly outperforms graph-based approaches (LPP variants) in classification accuracy, and achieves parity with deep self-supervised systems, despite being a shallow projector.

6. Implementation Notes and Hyperparameters

  • Data preprocessing: For images, PCA to 100 dimensions followed by standardization; no PCA for tabular features.
  • Classifier: 1-Nearest Neighbor in the learned embedding space.
  • Random train/test splits (5 repeats); accuracy and recall as primary metrics.
  • Convergence is typically rapid due to the closed-form structure of the pair graph and the log-softmax loss architecture.

7. Significance, Scope, and Limitations

SCL-1 exemplifies the intersection of classical graph-based learning and modern supervised contrastive paradigms. By reducing the feature extraction to supervised graph construction and a single log-softmax contrastive loss, SCL-1 provides an interpretable, efficient, and shallow alternative to both kernel methods and deep SSL. Performance is robust and closely tracks or supersedes deep contrastive methods on real datasets—despite requiring no augmentations, memory banks, or architectural modifications (Zhang, 2021).

A plausible implication is that the expressivity of contrastive supervision, even in linear settings, is sufficient to recover highly discriminative embeddings given fully-labeled data. SCL-1's elegant and minimal design makes it suitable for direct deployment in classical machine learning as well as for initializing downstream deep learning modules.

In sum, SCL-1 stands as an effective and theoretically grounded approach for supervised representation learning, leveraging pairwise label constraints in the context of contrastive optimization and yielding state-of-the-art results in both interpretability and empirical performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Supervised Contrastive Learning (SCL-1).

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube