Papers
Topics
Authors
Recent
Search
2000 character limit reached

Multi-View InfoNCE Loss

Updated 7 April 2026
  • The paper introduces MV-InfoNCE, which unifies multiple view comparisons into a single loss term to reduce gradient conflicts.
  • MV-InfoNCE aggregates all intra-instance positive interactions and inter-instance negatives, ensuring simultaneous alignment and uniformity of representations.
  • Empirical results show that MV-InfoNCE outperforms pairwise methods, improving top-1 accuracy by up to 0.8% on benchmarks like ImageNet and CIFAR.

Multi-View InfoNCE (MV-InfoNCE) is a contrastive loss formulation designed for multi-view self-supervised learning scenarios, addressing limitations of conventional pairwise contrastive objectives when leveraging more than two data augmentations (views) per instance. MV-InfoNCE enables simultaneous alignment of all within-instance views and comprehensive modeling of cross-instance interactions, extending InfoNCE to a principled, single-term objective per instance with alignment and uniformity guarantees (Koromilas et al., 9 Jul 2025).

1. Problem Setup and Notation

Given a mini-batch of MM instances, each instance yields NN views via diverse stochastic augmentations. Formally, for input xx, an encoder fθ:XRdf_\theta: X \to \mathbb{R}^d produces viewwise embeddings ui,l=fθ(xi,l)u_{i,l} = f_\theta(x_{i,l}), normalized to unit length for stability. These embeddings are indexed as Ui,lU_{i,l} in a tensor URM×N×dU \in \mathbb{R}^{M \times N \times d}.

The similarity function is defined by scaled cosine similarity:

sim(u,v)=uvτ\operatorname{sim}(u, v) = \frac{u^\top v}{\tau}

where τ>0\tau > 0 is a temperature parameter.

For each embedding ui,lu_{i,l}, the set of positives NN0 comprises other views of instance NN1, while negatives NN2 include all views from different instances.

2. MV-InfoNCE Loss Definition

MV-InfoNCE generalizes InfoNCE by aggregating all intra-instance view similarities and all inter-instance negatives into a single, per-instance loss. This reduces conflicting gradients and captures higher-order dependencies missed by pairwise summation. The core terms are:

  • Positive sum:

NN3

  • Negative sum:

NN4

The MV-InfoNCE loss is then:

NN5

This structure ensures that every view for a given instance is encouraged to align with all other views of the same instance, while being uniformly separated from embeddings of different instances.

3. Capturing All View Interactions

MV-InfoNCE structurally differs from conventional multi-view approaches, which typically aggregate NN6 pairwise InfoNCE terms. Instead, MV-InfoNCE consolidates interaction modeling into a single-term per instance:

  • One Loss Term per Instance: Each instance NN7 contributes one global loss term, eliminating conflicts arising from multiple, overlapping pairwise losses.
  • Simultaneous Alignment: The positive sum NN8 encompasses all intra-instance view pairs, requiring the encoder to align every view simultaneously.
  • Comprehensive Negative Energy: The negative sum NN9 incorporates all view interactions with other instances, maximizing uniformity across the batch.

This joint treatment yields an objective that forces holistic alignment and uniformity, rather than piecewise pairwise objectives that may introduce suboptimal local minima or miss collective dependencies (Koromilas et al., 9 Jul 2025).

4. Theoretical Characterization

In the large-batch regime (xx0), the MV-InfoNCE objective asymptotically decomposes into alignment and uniformity penalties:

xx1

The first term penalizes lack of alignment among same-instance views, while the second encourages the global embedding set to be uniformly distributed on the sphere. Global minimization is achieved when all within-instance views are identical (alignment) and all representations are distributed according to the uniform hyperspherical distribution (uniformity).

5. Comparison with Two-View InfoNCE

Contrasts between MV-InfoNCE and traditional pairwise (two-view) InfoNCE frameworks are summarized as follows:

Aspect Two-View InfoNCE MV-InfoNCE
Loss Terms per Instance xx2 1
Computational Order xx3 xx4
Positive Interactions Pairwise only All cross-view pairings
Gradient Symmetry View-of-interest asymmetry Fully symmetric

Pairwise objectives yield xx5 terms per instance, each focusing on a particular view, resulting in potential gradient interference and incomplete modeling of higher-order dependencies. MV-InfoNCE unifies all positive interactions and negatives, avoids the view-of-interest distinction, and captures all higher-order effects in a single term per instance (Koromilas et al., 9 Jul 2025).

6. Algorithmic Implementation

Efficient implementation of MV-InfoNCE follows the outlined pseudocode:

fθ:XRdf_\theta: X \to \mathbb{R}^d2 This procedure accumulates, for each instance, the positive and negative energy sums, and computes the log-ratio, averaged across the batch.

7. Empirical Evaluation and Scaling Behavior

MV-InfoNCE achieves superior performance relative to pairwise-aggregation baselines as the number of views increases:

  • Linear Evaluation Protocols: On CIFAR-10/100, ImageNet-100, and ImageNet-1K, MV-InfoNCE consistently surpasses pairwise objectives, with top-1 accuracy improvements of approximately xx6 to xx7 on ImageNet-1K at xx8.
  • Scaling with View Number: Unlike conventional approaches that saturate or degrade beyond xx9, MV-InfoNCE yields continued accuracy and embedding geometry improvements up to fθ:XRdf_\theta: X \to \mathbb{R}^d0.
  • Embedding Quality: k-Nearest Neighbor classification and neighborhood separability metrics indicate more uniform and better-aligned representation spaces as fθ:XRdf_\theta: X \to \mathbb{R}^d1 increases.

MV-InfoNCE's empirical scaling properties underscore its suitability for high-multiplicity view regimes, both in unimodal and multimodal settings (Koromilas et al., 9 Jul 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multi-View InfoNCE (MV-InfoNCE).