SC-InfoNCE: Scaled Convergence in Contrastive Learning

Updated 7 June 2026

SC-InfoNCE is a contrastive learning method that generalizes InfoNCE by scaling feature alignment with a data-driven transition probability matrix.
It introduces a tunable scalar s to flexibly control clustering strength, balancing intra- and inter-cluster similarities based on augmentation dynamics.
Empirical results across vision, graph, and text domains demonstrate that moderate scaling improves representation fidelity and downstream accuracy.

Scaled Convergence InfoNCE (SC-InfoNCE) is a contrastive learning objective that generalizes the InfoNCE loss by introducing a tunable convergence target, thereby enabling flexible control over feature similarity alignment. Unlike the standard InfoNCE, which promotes uniform clustering based on a constant target, SC-InfoNCE exploits a transition probability matrix (TPM) induced by data augmentations and scales it by a factor $s$ to modulate the influence of augmented view dynamics on learned representations. This framework yields a principled mechanism for tuning alignment strength in accordance with data statistics and downstream requirements (Cheng et al., 15 Nov 2025).

1. Recapitulation of the InfoNCE Objective and Transition Matrix Formalism

Let $\mathcal{D} = \{ x_1, \ldots, x_n \}$ be an unlabeled dataset, and let an augmentation distribution $T$ induce a finite feature space $S$ with cardinality $m = |S|$ . The transition-probability matrix $T \in [0,1]^{m \times m}$ is defined as

$T_{ij} = \Pr(\text{augmented view of feature } i \text{ equals feature } j).$

A parametric encoder $f_\theta$ produces embeddings $z_i = f_\theta(t_1(x_i))$ , $z_j = f_\theta(t_2(x_i))$ for independent augmentations $\mathcal{D} = \{ x_1, \ldots, x_n \}$ 0. Cosine similarity (typically after $\mathcal{D} = \{ x_1, \ldots, x_n \}$ 1-normalization) is used: $\mathcal{D} = \{ x_1, \ldots, x_n \}$ 2, with a temperature $\mathcal{D} = \{ x_1, \ldots, x_n \}$ 3.

The predicted pairwise probability is: $\mathcal{D} = \{ x_1, \ldots, x_n \}$ 4 where $\mathcal{D} = \{ x_1, \ldots, x_n \}$ 5 is the batch size. The InfoNCE loss is: $\mathcal{D} = \{ x_1, \ldots, x_n \}$ 6 with $\mathcal{D} = \{ x_1, \ldots, x_n \}$ 7 the positive index for anchor $\mathcal{D} = \{ x_1, \ldots, x_n \}$ 8. In expectation, InfoNCE drives $\mathcal{D} = \{ x_1, \ldots, x_n \}$ 9 toward a constant determined by the statistics of $T$ 0, promoting uniform “clustering” in representation space (Cheng et al., 15 Nov 2025).

2. SC-InfoNCE: Definition and Mathematical Formulation

SC-InfoNCE extends InfoNCE by replacing the uniform convergence target with a scaled, data-driven target. A scalar $T$ 1 is introduced to form a new target matrix: $T$ 2 The SC-InfoNCE objective is then: $T$ 3 This loss can equivalently be expressed as a cross-entropy between the predicted probability matrix and the scaled TPM.

The gradient with respect to the similarity matrix entry $T$ 4 is: $T$ 5 At stationarity, $T$ 6.

3. Theoretical Properties and Feature Clustering

Under the assumption of a sufficiently expressive encoder and an infinite data stream, any stationary point of SC-InfoNCE satisfies $T$ 7 for all $T$ 8. Through the softmax link $T$ 9, this yields $S$ 0. Feature pairs with large $S$ 1 (frequent cross-augmentation) will attain higher similarity, naturally imposing a soft clustering structure where affinities are prescribed by $S$ 2.

The scaling parameter $S$ 3 modulates the geometry: larger $S$ 4 amplifies log differences in transition probabilities, facilitating cluster separation but risking mode collapse when $S$ 5 is too large. Smaller $S$ 6 sharpens sensitivity to local differences but may reduce inter-cluster distinctness. In downstream scenarios matching the co-occurrence pattern of $S$ 7, proper normalization of $S$ 8 aligns pretraining geometry to test-time statistics (Cheng et al., 15 Nov 2025).

4. Algorithmic Implementation

The typical pipeline for SC-InfoNCE pretraining is:

Estimate the transition matrix $S$ 9 via Monte-Carlo simulation over augmentations.
Form the target matrix $m = |S|$ 0.
For each epoch and mini-batch:
- Sample two augmentations per anchor and encode to obtain $m = |S|$ 1 representations.
- Compute all pairwise similarities $m = |S|$ 2.
- Compute softmax probabilities $m = |S|$ 3.
- Calculate the cross-entropy loss $m = |S|$ 4.
- Backpropagate and update parameters $m = |S|$ 5.

Recommended hyperparameter ranges are:

$m = |S|$ 6, with $m = |S|$ 7 often a strong default.
$m = |S|$ 8, where smaller values sharpen the output distribution.
Batch size: 256, 1024, 64, 256.
Learning rate and weight decay as used in base InfoNCE protocols.

5. Empirical Evaluation Across Domains

Experiments were performed using vision (CIFAR-10, CIFAR-100, STL-10, ImageNet-100; ResNet-50 pretrained for 200 epochs), graph (COLLAB, DD, NCI1, PROTEINS; 3-layer GCN), and text (STS-B, SICK-R; BERT-base on 1M Wikipedia sentences). Baselines included SCL, InfoNCE, DCL, DHEL, and f-MICL.

Performance was assessed through linear-probe accuracy. Representative results:

Dataset	Std InfoNCE	SC-InfoNCE (best s)	Δ
CIFAR-10	90.53	91.49	+0.96
CIFAR-100	50.90	51.95	+1.05
ImageNet-100	74.62	75.62	+1.00
STL-10	84.07	85.54	+1.47
COLLAB	75.98	76.28	+0.30
DD	73.92	75.89	+1.97
NCI1	75.38	75.72	+0.34
PROTEINS	70.44	73.41	+2.97
STS-B	74.95	77.64	+2.69
SICK-R	73.87	75.48	+1.61

Ablation over $m = |S|$ 9 on CIFAR-10 reveals: | $T \in [0,1]^{m \times m}$ 0 | 0.5 | 1.0 | 1.5 | |---------|------|------|------| | Acc. | 89.1 | 90.5 | 91.2 |

This suggests that moderate increases in $T \in [0,1]^{m \times m}$ 1 can consistently improve performance; however, excessive scaling risks instability.

6. Practical Considerations and Limitations

Choosing $T \in [0,1]^{m \times m}$ 2 should be guided by data characteristics:

For mild augmentations and low inter-class separation, increase $T \in [0,1]^{m \times m}$ 3.
To mitigate embedding collapse (few large embedding covariance eigenvalues), decrease $T \in [0,1]^{m \times m}$ 4.
For fine-grained tasks (e.g., STS-B), moderate increases ( $T \in [0,1]^{m \times m}$ 5) can enhance subtle representation fidelity.

Limitations include the necessity to estimate $T \in [0,1]^{m \times m}$ 6, which can be challenging in high-dimensional settings; one may approximate $T \in [0,1]^{m \times m}$ 7 among prototypes or clusters. The global scaling $T \in [0,1]^{m \times m}$ 8 may not correct for class imbalance, and row- or time-adaptive scaling may be advantageous—this remains an open direction. Excessive $T \in [0,1]^{m \times m}$ 9 may result in representation collapse if $T_{ij} = \Pr(\text{augmented view of feature } i \text{ equals feature } j).$ 0 contains large entries (Cheng et al., 15 Nov 2025).

SC-InfoNCE enables principled, tunable alignment to augmentation-induced feature affinities, trading off inter- versus intra-cluster structure and yielding competitive results across multiple domains.

Markdown Report Issue Upgrade to Chat

References (1)

Understanding InfoNCE: Transition Probability Matrix Induced Feature Clustering (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Scaled Convergence InfoNCE (SC-InfoNCE).

SC-InfoNCE: Scaled Convergence in Contrastive Learning

1. Recapitulation of the InfoNCE Objective and Transition Matrix Formalism

2. SC-InfoNCE: Definition and Mathematical Formulation

3. Theoretical Properties and Feature Clustering

4. Algorithmic Implementation

5. Empirical Evaluation Across Domains

6. Practical Considerations and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

SC-InfoNCE: Scaled Convergence in Contrastive Learning

1. Recapitulation of the InfoNCE Objective and Transition Matrix Formalism

2. SC-InfoNCE: Definition and Mathematical Formulation

3. Theoretical Properties and Feature Clustering

4. Algorithmic Implementation

5. Empirical Evaluation Across Domains

6. Practical Considerations and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research