Supervised Contrastive Adversarial Learning
- Supervised Contrastive Adversarial Learning (SCAL) is an adversarial defense framework that combines contrastive objectives with adversarial training to create robust feature spaces.
- It employs margin-based constraints and hard-mined contrastive losses to enforce intra-class clustering and inter-class separation under adversarial perturbations.
- Empirical results across image, NLP, and sequential models show that SCAL enhances robustness against attacks while maintaining high clean accuracy.
Supervised Contrastive Adversarial Learning (SCAL) is a family of adversarial defense frameworks designed to improve the robustness of deep neural networks against adversarial attacks. SCAL integrates supervised contrastive learning with adversarial training, often augmented by margin-based constraints or hard-mined contrastive losses. The central premise is that enforcing semantic structure and separation in feature space via contrastive objectives, particularly under adversarial perturbations, yields models that are robust to attack without sacrificing clean accuracy. Key instantiations have been introduced for image classification (Wang et al., 27 Dec 2024, Bhattacharya et al., 31 Oct 2025, Ghofrani et al., 2023), natural language processing (Miao et al., 2021), and sequential models (Hu et al., 2023).
1. Core Principles and Problem Statement
SCAL addresses the vulnerability of deep classifiers by fortifying both the feature representations and the decision boundaries against adversarial examples. The canonical adversarially robust classification task is
where is a neural classifier, is cross-entropy, is an adversarial perturbation, and bounds its magnitude (Wang et al., 27 Dec 2024).
Conventional adversarial training focuses on minimizing this objective, but vulnerabilities persist due to poor structure in the feature/embedding space. SCAL rectifies these weaknesses by clustering representations of same-class samples—including adversarially perturbed instances—while separating those of different classes via supervised contrastive penalties. Margin-based enhancements enforce explicit cosine-similarity constraints for additional separation.
2. Loss Functions and Training Objectives
Across various implementations, SCAL frameworks combine three major components:
| Loss Term | Mathematical Formulation | Role |
|---|---|---|
| Cross-Entropy | Classification on clean/adv samples | |
| Sup. Contrastive | <br> | Intra-class clustering, inter-class separation |
| Margin Contrastive | <br> | Tight intra-class, explicit inter-class margin (SVM-like) |
The training objective is commonly specified as
with hyperparameters controlling tradeoffs (Wang et al., 27 Dec 2024). Alternate formulations use weights for adversarial and clean contrastive terms (Ghofrani et al., 2023), or inject hard-mined positives into the contrastive sum (Bhattacharya et al., 31 Oct 2025).
Margin-based constraints approximate an SVM-type guarantee: positive pairs maintain , negatives , ensuring a feature-space margin (Wang et al., 27 Dec 2024).
3. Adversarial Data Generation and Learning Pipeline
Adversarial samples are typically generated with FGSM or PGD:
- FGSM:
- PGD: Multi-iteration update projecting inside -ball, using gradients of the total SCAL loss.
The general training loop structures as:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
for epoch in range(E): for minibatch in dataset: # Adversarial example generation (FGSM/PGD) x_adv = x + ε * sign(gradient_x(L_CE(f_theta(x), y))) # Forward on both clean and adversarial inputs z = proj_head(backbone(x)) z_adv = proj_head(backbone(x_adv)) # Compute all loss components L_CE = CE(f_theta(x_adv), y) L_SC = SupContrastive({z}) L_margin = MarginContrastive({z}) L_total = L_CE + λ1 * L_SC + λ2 * L_margin # Optimizer step optimizer.zero_grad() L_total.backward() optimizer.step() |
Variants use more sophisticated sample selection for contrastive loss (see Section 4).
4. Positive/Negative Pair Selection and Hard Mining
Choice of positive and negative samples is critical. Standard split by label can produce semantically weak positives/negatives—ASCL (Bui et al., 2021) and ANCHOR (Bhattacharya et al., 31 Oct 2025) propose:
- Local Selection (ASCL): Select positives/negatives by current model predictions, focusing on high-risk cluster boundaries. Leaked-LS restricts to samples whose predictions match the anchor’s, greatly reducing sample count yet increasing effectiveness.
- Hard Mining (ANCHOR): Upweights low-similarity positives in supervised contrastive loss, forcing the model to resolve “hard” intra-class distinctions. This is implemented via adaptive weights:
and increases throughout training to focus learning signal (Bhattacharya et al., 31 Oct 2025).
Efficient local mining delivers state-of-the-art robustness using a fraction of the positive/negative pairs required by global selection (Bui et al., 2021).
5. Empirical Results and Benchmarks
SCAL frameworks consistently outperform both vanilla adversarial training and standard contrastive learning in adversarial robustness across modalities.
CIFAR-100, ResNet-18, FGSM accuracy (Wang et al., 27 Dec 2024):
| Baseline CE | SCL (joint) | Refined SCL | Margin SCL | |
|---|---|---|---|---|
| 0.01 | 19.7 | 20.4 | 19.8 | 20.1 |
| 0.02 | 15.3 | 16.0 | 14.9 | 15.8 |
| 0.03 | 8.2 | 9.1 | 7.1 | 9.5 |
Margin SCL provides maximum FGSM robustness. Joint optimization outperforms post-hoc or stage-wise contrastive training.
CIFAR-10, ResNet-18, PGD-20 (Bhattacharya et al., 31 Oct 2025):
| Method | Robust Acc (PGD) | Robust Acc (AA) | Clean Acc |
|---|---|---|---|
| Madry AT | 44.05 | 40.07 | 84.48 |
| TRADES | 51.41 | 45.41 | 82.20 |
| AdvCL (APT) | 52.01 | 43.52 | 79.39 |
| ASCL (APT) | 53.09 | 45.70 | 81.67 |
| ANCHOR (APT) | 54.10 | 46.07 | 81.40 |
On both datasets, SCAL instantiations improve robust accuracy by 1–3 percentage points over best previous methods.
Further, representation analysis using CKA demonstrates that SCAL yields high similarity between clean and adversarial examples, especially in deep layers—correlated with increased robustness (Ghofrani et al., 2023).
6. Domain Extensions and Model Architectures
SCAL generalizes beyond image classification:
- NLP Applications (Miao et al., 2021): SCAL extends to Transformer-based NLU, injecting adversarial perturbations into input embeddings, yielding consistent gains on GLUE (↑1.75%), TREC/AG's News (↑1.2%), and robustness on ANLI adversarial NLI (↑2–4 pp over InfoBERT/FreeLB).
- Sequential Architectures (Hu et al., 2023): SACL-LSTM integrates SCAL in ERC, perturbing internal LSTM gate states under contextual adversarial training—improving both macro-F1 and context robustness at controlled .
Dominant hyperparameters include temperature (), margin values (, ), and weights (, , ) tuned via held-out validation.
7. Limitations, Future Work, and Theoretical Implications
Limitations include restriction to single-step attacks (e.g., FGSM), lack of certified robustness bounds, and evaluation on modest networks and datasets. Scaling to multi-step PGD, larger architectures, and more complex attack scenarios is a core open direction (Wang et al., 27 Dec 2024, Bhattacharya et al., 31 Oct 2025).
Potential research avenues:
- Theoretical link between margin separation and certified robustness radii.
- Automated joint tuning of regularization weights and margins.
- Extension of hard-negative mining and local sample selection principles to large-scale and unsupervised settings.
- Application to alternative domains (e.g., segmentation, sequence modeling).
A plausible implication is that the synthesis of semantic feature clustering, explicit margin enforcement, and adversarial robustification, as exemplified by SCAL, establishes a general blueprint for adversarially safe representation learning across domains.