Supervised Contrastive Adversarial Learning

Updated 4 December 2025

Supervised Contrastive Adversarial Learning (SCAL) is an adversarial defense framework that combines contrastive objectives with adversarial training to create robust feature spaces.
It employs margin-based constraints and hard-mined contrastive losses to enforce intra-class clustering and inter-class separation under adversarial perturbations.
Empirical results across image, NLP, and sequential models show that SCAL enhances robustness against attacks while maintaining high clean accuracy.

Supervised Contrastive Adversarial Learning (SCAL) is a family of adversarial defense frameworks designed to improve the robustness of deep neural networks against adversarial attacks. SCAL integrates supervised contrastive learning with adversarial training, often augmented by margin-based constraints or hard-mined contrastive losses. The central premise is that enforcing semantic structure and separation in feature space via contrastive objectives, particularly under adversarial perturbations, yields models that are robust to attack without sacrificing clean accuracy. Key instantiations have been introduced for image classification (Wang et al., 27 Dec 2024, Bhattacharya et al., 31 Oct 2025, Ghofrani et al., 2023), natural language processing (Miao et al., 2021), and sequential models (Hu et al., 2023).

1. Core Principles and Problem Statement

SCAL addresses the vulnerability of deep classifiers by fortifying both the feature representations and the decision boundaries against adversarial examples. The canonical adversarially robust classification task is

$\min_{\theta}\; \mathbb{E}_{(x,y)\sim p_{\rm data}} \Big[\max_{\|\delta\|\le\epsilon}\; \ell_{\rm CE}(f_\theta(x+\delta),\,y)\Big]$

where $f_\theta$ is a neural classifier, $\ell_{\rm CE}$ is cross-entropy, $\delta$ is an adversarial perturbation, and $\epsilon$ bounds its magnitude (Wang et al., 27 Dec 2024).

Conventional adversarial training focuses on minimizing this objective, but vulnerabilities persist due to poor structure in the feature/embedding space. SCAL rectifies these weaknesses by clustering representations of same-class samples—including adversarially perturbed instances—while separating those of different classes via supervised contrastive penalties. Margin-based enhancements enforce explicit cosine-similarity constraints for additional separation.

2. Loss Functions and Training Objectives

Across various implementations, SCAL frameworks combine three major components:

Loss Term	Mathematical Formulation	Role
Cross-Entropy $\mathcal{L}_{CE}$	$-\sum_{c=1}^C y_c \log \mathrm{softmax}_c(f_\theta(x))$	Classification on clean/adv samples
Sup. Contrastive $\mathcal{L}_{SC}$	$\frac{1}{N}\sum_{i=1}^N \frac{-1}{\|P(i)\|} \sum_{p\in P(i)}$ <br> $\log\frac{\exp(\mathrm{sim}(\mathbf{z}_i,\mathbf{z}_p)/\tau)}{\sum_{a\in A(i)}\exp(\mathrm{sim}(\mathbf{z}_i,\mathbf{z}_a)/\tau)}$	Intra-class clustering, inter-class separation
Margin Contrastive $\mathcal{L}_{margin}$	$\frac{1}{N}\sum_{i=1}^N \Big[ \frac{1}{\|P(i)\|} \sum_{p\in P(i)} \max(0,m_p - \mathrm{sim}(\mathbf{z}_i,\mathbf{z}_p))$ <br> $+ \frac{1}{\|N(i)\|} \sum_{n\in N(i)} \max(0,\mathrm{sim}(\mathbf{z}_i,\mathbf{z}_n) - m_n) \Big]$	Tight intra-class, explicit inter-class margin (SVM-like)

The training objective is commonly specified as

$\mathcal{L}_{total} = \mathcal{L}_{CE} + \lambda_1\,\mathcal{L}_{SC} + \lambda_2\,\mathcal{L}_{margin}$

with hyperparameters $\lambda_1, \lambda_2$ controlling tradeoffs (Wang et al., 27 Dec 2024). Alternate formulations use weights for adversarial and clean contrastive terms (Ghofrani et al., 2023), or inject hard-mined positives into the contrastive sum (Bhattacharya et al., 31 Oct 2025).

Margin-based constraints approximate an SVM-type guarantee: positive pairs maintain $\mathrm{sim}(\mathbf{z}_i,\mathbf{z}_p)\geq m_p$ , negatives $\leq m_n$ , ensuring a feature-space margin (Wang et al., 27 Dec 2024).

3. Adversarial Data Generation and Learning Pipeline

Adversarial samples are typically generated with FGSM or PGD:

FGSM: $x^{adv} = x + \epsilon \cdot \mathrm{sign}(\nabla_x \mathcal{L}_{CE}(f_\theta(x), y))$
PGD: Multi-iteration update projecting inside $\ell_\infty$ -ball, using gradients of the total SCAL loss.

The general training loop structures as:

for epoch in range(E):
    for minibatch in dataset:
        # Adversarial example generation (FGSM/PGD)
        x_adv = x + ε * sign(gradient_x(L_CE(f_theta(x), y)))
        # Forward on both clean and adversarial inputs
        z = proj_head(backbone(x))
        z_adv = proj_head(backbone(x_adv))
        # Compute all loss components
        L_CE     = CE(f_theta(x_adv), y)
        L_SC     = SupContrastive({z})
        L_margin = MarginContrastive({z})
        L_total  = L_CE + λ1 * L_SC + λ2 * L_margin
        # Optimizer step
        optimizer.zero_grad()
        L_total.backward()
        optimizer.step()

(Wang et al., 27 Dec 2024)

Variants use more sophisticated sample selection for contrastive loss (see Section 4).

4. Positive/Negative Pair Selection and Hard Mining

Choice of positive and negative samples is critical. Standard split by label can produce semantically weak positives/negatives—ASCL (Bui et al., 2021) and ANCHOR (Bhattacharya et al., 31 Oct 2025) propose:

Local Selection (ASCL): Select positives/negatives by current model predictions, focusing on high-risk cluster boundaries. Leaked-LS restricts to samples whose predictions match the anchor’s, greatly reducing sample count yet increasing effectiveness.
Hard Mining (ANCHOR): Upweights low-similarity positives in supervised contrastive loss, forcing the model to resolve “hard” intra-class distinctions. This is implemented via adaptive weights:

$w_{ip} = \left(1 - \mathrm{sim}(\mathbf{z}_i, \mathbf{z}_p)\right)^{\beta_t}$

and $\beta_t$ increases throughout training to focus learning signal (Bhattacharya et al., 31 Oct 2025).

Efficient local mining delivers state-of-the-art robustness using a fraction of the positive/negative pairs required by global selection (Bui et al., 2021).

5. Empirical Results and Benchmarks

SCAL frameworks consistently outperform both vanilla adversarial training and standard contrastive learning in adversarial robustness across modalities.

CIFAR-100, ResNet-18, FGSM accuracy (Wang et al., 27 Dec 2024):

$\epsilon$	Baseline CE	SCL (joint)	Refined SCL	Margin SCL
0.01	19.7	20.4	19.8	20.1
0.02	15.3	16.0	14.9	15.8
0.03	8.2	9.1	7.1	9.5

Margin SCL provides maximum FGSM robustness. Joint optimization outperforms post-hoc or stage-wise contrastive training.

CIFAR-10, ResNet-18, PGD-20 (Bhattacharya et al., 31 Oct 2025):

Method	Robust Acc (PGD)	Robust Acc (AA)	Clean Acc
Madry AT	44.05	40.07	84.48
TRADES	51.41	45.41	82.20
AdvCL (APT)	52.01	43.52	79.39
ASCL (APT)	53.09	45.70	81.67
ANCHOR (APT)	54.10	46.07	81.40

On both datasets, SCAL instantiations improve robust accuracy by 1–3 percentage points over best previous methods.

Further, representation analysis using CKA demonstrates that SCAL yields high similarity between clean and adversarial examples, especially in deep layers—correlated with increased robustness (Ghofrani et al., 2023).

6. Domain Extensions and Model Architectures

SCAL generalizes beyond image classification:

NLP Applications (Miao et al., 2021): SCAL extends to Transformer-based NLU, injecting adversarial perturbations into input embeddings, yielding consistent gains on GLUE (↑1.75%), TREC/AG's News (↑1.2%), and robustness on ANLI adversarial NLI (↑2–4 pp over InfoBERT/FreeLB).
Sequential Architectures (Hu et al., 2023): SACL-LSTM integrates SCAL in ERC, perturbing internal LSTM gate states under contextual adversarial training—improving both macro-F1 and context robustness at controlled $\epsilon$ .

Dominant hyperparameters include temperature ( $\tau$ ), margin values ( $m_p$ , $m_n$ ), and weights ( $\lambda_1$ , $\lambda_2$ , $\alpha$ ) tuned via held-out validation.

7. Limitations, Future Work, and Theoretical Implications

Limitations include restriction to single-step attacks (e.g., FGSM), lack of certified robustness bounds, and evaluation on modest networks and datasets. Scaling to multi-step PGD, larger architectures, and more complex attack scenarios is a core open direction (Wang et al., 27 Dec 2024, Bhattacharya et al., 31 Oct 2025).

Potential research avenues:

Theoretical link between margin separation and certified robustness radii.
Automated joint tuning of regularization weights and margins.
Extension of hard-negative mining and local sample selection principles to large-scale and unsupervised settings.
Application to alternative domains (e.g., segmentation, sequence modeling).

A plausible implication is that the synthesis of semantic feature clustering, explicit margin enforcement, and adversarial robustification, as exemplified by SCAL, establishes a general blueprint for adversarially safe representation learning across domains.