Papers
Topics
Authors
Recent
Search
2000 character limit reached

AdvInfoNCE: Adversarial Contrastive Learning

Updated 19 May 2026
  • AdvInfoNCE is a family of contrastive learning objectives that adaptively weights adversarial hard negatives to improve robustness.
  • It employs adversarial techniques to generate and emphasize hard negatives in collaborative filtering, multi-modal generation, and robust representation learning.
  • Empirical studies demonstrate significant gains in Recall, NDCG, and adversarial accuracy, underpinned by theoretical guarantees from robust optimization.

Adversarial InfoNCE (AdvInfoNCE) encompasses a family of contrastive learning objectives extending the standard InfoNCE loss with adversarial or hardness-aware mechanisms. AdvInfoNCE is instantiated in multiple domains: collaborative filtering for recommender systems, representation learning for adversarial robustness, and energy-based generative modeling for multi-modal imitation learning. Common to these settings is the explicit adversarial treatment of negatives—whether as learned hard negatives in candidate pools, generator outputs in adversarial games, or adversarially perturbed views in robust feature learning—offering theoretical and empirical advantages over classical contrastive paradigms.

1. Conceptual Overview and Core Formulation

InfoNCE is a contrastive loss widely used to learn representations by distinguishing positive pairs from negative samples. The standard formulation for encoded samples zi,zjz_i, z_j and set of negatives N(i)\mathcal{N}(i) is:

LInfoNCE(i,j)=logexp(sim(zi,zj)/τ)exp(sim(zi,zj)/τ)+kN(i)exp(sim(zi,zk)/τ)\mathcal{L}_{\mathrm{InfoNCE}}(i, j) = -\log \frac{\exp(\mathrm{sim}(z_i, z_j)/\tau)}{\exp(\mathrm{sim}(z_i, z_j)/\tau) + \sum_{k \in \mathcal{N}(i)} \exp(\mathrm{sim}(z_i, z_k)/\tau)}

where sim(u,v)\mathrm{sim}(u, v) is usually cosine or dot-product similarity, and τ\tau is a temperature parameter.

AdvInfoNCE modifies this principle by:

  • Assigning adaptive weights or margins to negatives based on estimated hardness or adversarial difficulty.
  • Adversarially learning the negative sampling distribution to focus on worst-case negatives.
  • Explicitly incorporating adversarial samples (either in the input or feature space) as negatives, “hard negatives,” or adapted positives.
  • Allowing for asymmetric similarity computation and dynamic weighting to resolve conflicts between instance discrimination and adversarial robustness.

The details of AdvInfoNCE vary by domain and objective, as elaborated in subsequent sections.

2. Hardness-Aware AdvInfoNCE for Collaborative Filtering

In collaborative filtering (CF) with implicit feedback (user–item interaction), vanilla InfoNCE treats all unobserved items as negatives—making it susceptible to false negatives and inadequate for distinguishing between hard and easy negatives. AdvInfoNCE, introduced in “Empowering Collaborative Filtering with Principled Adversarial Contrastive Loss” (Zhang et al., 2023), addresses these shortcomings by adversarially learning per-instance hardness scores, denoted δj\delta_j for each negative jj:

  1. Ranking criterion: For user uu and positive item ii, the goal is to ensure jN(u):s(u,j)s(u,i)+δj<0\forall j \in N_{(u)}: s(u,j) - s(u,i) + \delta_j < 0, with N(i)\mathcal{N}(i)0 the similarity score and N(i)\mathcal{N}(i)1 larger for hard negatives (smaller or negative for potential false negatives).
  2. LogSumExp relaxation: This leads to a variant of InfoNCE,

N(i)\mathcal{N}(i)2

recovering standard InfoNCE for N(i)\mathcal{N}(i)3.

  1. Adversarial hardness learning: The hardness scores are learned in a min–max (adversarial) optimization:

N(i)\mathcal{N}(i)4

where N(i)\mathcal{N}(i)5 are model parameters and N(i)\mathcal{N}(i)6 collects all N(i)\mathcal{N}(i)7.

  1. DRO interpretation: With N(i)\mathcal{N}(i)8, this procedure is equivalent to KL-constrained distributionally robust optimization (DRO), enforcing robustness to negative sampling shifts.

Empirical results on multiple datasets (KuaiRec, Tencent, Yahoo!R3, Coat) show improvements of up to +21.9% Recall@20 and +24.1% NDCG@20 over InfoNCE, with consistent gains even under distribution shift and across different backbone architectures such as LightGCN and MF (Zhang et al., 2023). AdvInfoNCE automatically emphasizes popular (hard) negatives while down-weighting long-tail (likely false negative) items.

3. Adversarial InfoNCE in Multi-Modal and Generative Behavior Cloning

In multi-modal behavior cloning and energy-based generative models, standard InfoNCE does not account for mode collapse in the generator, nor does it provide adversarial pressure to match energy landscapes with generator distributions. In “EBGAN-MDN” (Li et al., 8 Oct 2025), AdvInfoNCE emerges as follows:

  • Contrasting with generator samples: The denominator of InfoNCE is extended to include generator outputs as adversarial (hard) negatives, weighted by a factor N(i)\mathcal{N}(i)9:

LInfoNCE(i,j)=logexp(sim(zi,zj)/τ)exp(sim(zi,zj)/τ)+kN(i)exp(sim(zi,zk)/τ)\mathcal{L}_{\mathrm{InfoNCE}}(i, j) = -\log \frac{\exp(\mathrm{sim}(z_i, z_j)/\tau)}{\exp(\mathrm{sim}(z_i, z_j)/\tau) + \sum_{k \in \mathcal{N}(i)} \exp(\mathrm{sim}(z_i, z_k)/\tau)}0

where LInfoNCE(i,j)=logexp(sim(zi,zj)/τ)exp(sim(zi,zj)/τ)+kN(i)exp(sim(zi,zk)/τ)\mathcal{L}_{\mathrm{InfoNCE}}(i, j) = -\log \frac{\exp(\mathrm{sim}(z_i, z_j)/\tau)}{\exp(\mathrm{sim}(z_i, z_j)/\tau) + \sum_{k \in \mathcal{N}(i)} \exp(\mathrm{sim}(z_i, z_k)/\tau)}1 and LInfoNCE(i,j)=logexp(sim(zi,zj)/τ)exp(sim(zi,zj)/τ)+kN(i)exp(sim(zi,zk)/τ)\mathcal{L}_{\mathrm{InfoNCE}}(i, j) = -\log \frac{\exp(\mathrm{sim}(z_i, z_j)/\tau)}{\exp(\mathrm{sim}(z_i, z_j)/\tau) + \sum_{k \in \mathcal{N}(i)} \exp(\mathrm{sim}(z_i, z_k)/\tau)}2 is a generator (MDN).

  • Dynamic weighting: LInfoNCE(i,j)=logexp(sim(zi,zj)/τ)exp(sim(zi,zj)/τ)+kN(i)exp(sim(zi,zk)/τ)\mathcal{L}_{\mathrm{InfoNCE}}(i, j) = -\log \frac{\exp(\mathrm{sim}(z_i, z_j)/\tau)}{\exp(\mathrm{sim}(z_i, z_j)/\tau) + \sum_{k \in \mathcal{N}(i)} \exp(\mathrm{sim}(z_i, z_k)/\tau)}3 is scheduled to decrease as training progresses, focusing learning on generator outputs early on and relaxing as the generator improves.
  • Two-player game: The energy model is updated to assign low energy to real and plausible generated samples, and high energy to collapsed or invalid ones. The generator is trained to produce modes that reach low energy.

Empirical studies show that incorporating generator outputs in AdvInfoNCE substantially improves multi-modal generative coverage and sharpness, as seen in KL and Wasserstein metrics—outperforming non-adversarial InfoNCE variants and non-energy-based baselines (Li et al., 8 Oct 2025).

4. AdvInfoNCE for Adversarial Robustness in Multi-Modal Encoders

AdvInfoNCE also arises as a “clean–adversarial InfoNCE” in the adversarial calibration of unified multi-modal encoders (Liao et al., 17 May 2025). Here, the contrastive batch consists of both clean and adversarial examples,

LInfoNCE(i,j)=logexp(sim(zi,zj)/τ)exp(sim(zi,zj)/τ)+kN(i)exp(sim(zi,zk)/τ)\mathcal{L}_{\mathrm{InfoNCE}}(i, j) = -\log \frac{\exp(\mathrm{sim}(z_i, z_j)/\tau)}{\exp(\mathrm{sim}(z_i, z_j)/\tau) + \sum_{k \in \mathcal{N}(i)} \exp(\mathrm{sim}(z_i, z_k)/\tau)}4

where LInfoNCE(i,j)=logexp(sim(zi,zj)/τ)exp(sim(zi,zj)/τ)+kN(i)exp(sim(zi,zk)/τ)\mathcal{L}_{\mathrm{InfoNCE}}(i, j) = -\log \frac{\exp(\mathrm{sim}(z_i, z_j)/\tau)}{\exp(\mathrm{sim}(z_i, z_j)/\tau) + \sum_{k \in \mathcal{N}(i)} \exp(\mathrm{sim}(z_i, z_k)/\tau)}5 and LInfoNCE(i,j)=logexp(sim(zi,zj)/τ)exp(sim(zi,zj)/τ)+kN(i)exp(sim(zi,zk)/τ)\mathcal{L}_{\mathrm{InfoNCE}}(i, j) = -\log \frac{\exp(\mathrm{sim}(z_i, z_j)/\tau)}{\exp(\mathrm{sim}(z_i, z_j)/\tau) + \sum_{k \in \mathcal{N}(i)} \exp(\mathrm{sim}(z_i, z_k)/\tau)}6 are the clean and adversarial embeddings for the same sample, and LInfoNCE(i,j)=logexp(sim(zi,zj)/τ)exp(sim(zi,zj)/τ)+kN(i)exp(sim(zi,zk)/τ)\mathcal{L}_{\mathrm{InfoNCE}}(i, j) = -\log \frac{\exp(\mathrm{sim}(z_i, z_j)/\tau)}{\exp(\mathrm{sim}(z_i, z_j)/\tau) + \sum_{k \in \mathcal{N}(i)} \exp(\mathrm{sim}(z_i, z_k)/\tau)}7 is the temperature-scaled cosine similarity.

Key characteristics include:

  • Frozen encoders: Only modality-specific projection heads are trained.
  • Adversarial examples: Generated offline (e.g. AutoAttack with LInfoNCE(i,j)=logexp(sim(zi,zj)/τ)exp(sim(zi,zj)/τ)+kN(i)exp(sim(zi,zk)/τ)\mathcal{L}_{\mathrm{InfoNCE}}(i, j) = -\log \frac{\exp(\mathrm{sim}(z_i, z_j)/\tau)}{\exp(\mathrm{sim}(z_i, z_j)/\tau) + \sum_{k \in \mathcal{N}(i)} \exp(\mathrm{sim}(z_i, z_k)/\tau)}8 perturbation), used solely for fine-tuning.
  • Empirical robustness: InfoNCE-based objectives yield the best adversarial improvements (up to +10.4% AutoAttack accuracy at LInfoNCE(i,j)=logexp(sim(zi,zj)/τ)exp(sim(zi,zj)/τ)+kN(i)exp(sim(zi,zk)/τ)\mathcal{L}_{\mathrm{InfoNCE}}(i, j) = -\log \frac{\exp(\mathrm{sim}(z_i, z_j)/\tau)}{\exp(\mathrm{sim}(z_i, z_j)/\tau) + \sum_{k \in \mathcal{N}(i)} \exp(\mathrm{sim}(z_i, z_k)/\tau)}9) while preserving clean performance, outperforming sim(u,v)\mathrm{sim}(u, v)0 alignment and cross-entropy alternatives (Liao et al., 17 May 2025).

Theoretical analysis supports that minimizing InfoNCE on clean–adversarial pairs induces local regularity and upper-bounds the worst-case shift in embeddings.

5. Asymmetric and Generalized Adversarial InfoNCE Variants

A generic framework for adversarial contrastive learning is presented in “Adversarial Contrastive Learning via Asymmetric InfoNCE” (Yu et al., 2022), introducing a family of A-InfoNCE objectives. Novel features include:

  • Asymmetric similarity: Gradient flow is selectively modulated by an sim(u,v)\mathrm{sim}(u, v)1 parameter in

sim(u,v)\mathrm{sim}(u, v)2

  • Inferior positives: Adversarial views are treated as positive but with down-weighted (sim(u,v)\mathrm{sim}(u, v)3-scaled) significance or reduced gradient via sim(u,v)\mathrm{sim}(u, v)4.
  • Hard negatives: Alternatively, adversarial (or strongly similar) views serve as reweighted negatives.
  • PU-debiasing: Weights correct for false negatives using positive-unlabeled learning priors (sim(u,v)\mathrm{sim}(u, v)5).
  • Adaptive annealing: Dynamic scheduling of sim(u,v)\mathrm{sim}(u, v)6 between sim(u,v)\mathrm{sim}(u, v)7 further improves robust accuracy.

Empirically, A-InfoNCE consistently outperforms adversarial contrastive baselines (AdvCL, RoCL) on CIFAR-10, CIFAR-100, and STL-10, boosting both clean and robust accuracy across standard and strong adversarial attacks. Combined “IP+HN” (inferior positive + hard negative) objectives yield the best tradeoffs, with 1–2% improvement in robust accuracy on challenging benchmarks (Yu et al., 2022).

6. Training Protocols and Practical Recommendations

Across domains, AdvInfoNCE objectives share key training and hyperparameter practices:

  • Batch construction: Use sufficiently large batches to sample hard (informative) negatives—sim(u,v)\mathrm{sim}(u, v)8–sim(u,v)\mathrm{sim}(u, v)9 is recommended for CF tasks (Zhang et al., 2023).
  • Hardness modeling: Embedding-based mappings for learning τ\tau0 or hardness weights are effective, with normalization for negative sampling probabilities.
  • Projection architectures: MLP heads (optionally parameter-efficient via LoRA) atop frozen feature extractors are effective for robust representation learning (Liao et al., 17 May 2025).
  • Adversarial training schedules: Alternate updates for model vs. hardness generator, or schedule weight for adversarial samples (e.g. τ\tau1 in (Li et al., 8 Oct 2025)) with early stopping when validation metrics plateau.
  • Computational cost: AdvInfoNCE typically introduces marginal overhead relative to standard InfoNCE of τ\tau2(batch · negatives · dim) per step (Zhang et al., 2023).

7. Theoretical Guarantees and Broader Implications

AdvInfoNCE possesses several theoretical properties:

  • DRO equivalence: The KL-constrained DRO formulation guarantees that the learned model is robust to worst-case shifts in the negative distribution and focuses the contrastive signal on substantial hard negatives (Zhang et al., 2023).
  • Mutual information bound: AdvInfoNCE remains a lower bound on mutual information; extra (weighted) negatives only tighten the estimate (Li et al., 8 Oct 2025).
  • Lipschitz control: In adversarial representation learning, minimizing InfoNCE on clean–adv pairs regularizes the mapping and promotes stability under perturbations (Liao et al., 17 May 2025).

Broader implications include:

  • Generalizability: AdvInfoNCE frameworks can readily absorb multi-view, cross-modal, or data-quality asymmetries by adjusting positive/negative sets, similarity flows (τ\tau3), and sampling weights.
  • Model-agnostic robustness: AdvInfoNCE achieves consistent gains independent of CF backbone (GCN, MF, UltraGCN), and similarly demonstrates strong transfer in encoder-based adversarial robustness and multi-modal generation settings.

AdvInfoNCE, in its various incarnations, represents a principled adversarial extension of contrastive learning frameworks. By adaptively highlighting hard negatives—whether through adversarial example construction, generator competition, or learned hardness—AdvInfoNCE yields theoretical robustness guarantees and state-of-the-art empirical results across collaborative filtering, multi-modal learning, and adversarial representation alignment (Zhang et al., 2023, Li et al., 8 Oct 2025, Liao et al., 17 May 2025, Yu et al., 2022).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Adversarial InfoNCE (AdvInfoNCE).