Temperature-Free H-InfoNCE

Updated 4 December 2025

The paper introduces an arctanh-based InfoNCE loss that eliminates the need for temperature tuning while offering robust gradient dynamics.
It replaces cosine similarity scaling with an unbounded arctanh mapping, facilitating more expressive logits and improved contrastive learning.
Empirical benchmarks across vision, graphs, NLP, and recommender systems show that H-InfoNCE matches or outperforms manually tuned temperature approaches.

Temperature-Free H-InfoNCE (arctanh-based InfoNCE, “scaled log-odds”) defines a hyperparameter-free alternative to the canonical temperature-scaled InfoNCE loss for contrastive representation learning. By replacing the typical temperature scaling of cosine similarities with an arctanh mapping, the loss removes the need to tune the critical temperature parameter and exhibits more robust gradient behavior, with empirical results demonstrating state-of-the-art or superior performance across vision, graph, anomaly detection, NLP, and recommender system benchmarks (Kim et al., 29 Jan 2025).

1. Standard InfoNCE and Temperature Scaling

The InfoNCE loss is central to contrastive learning, operating on a batch of $N$ anchor examples $\{x_i\}$ , paired (e.g., with augmentations or alternative views) to produce a positive sample index $i^+$ and $N-1$ negatives $j \ne i$ . An encoder $f(\cdot)$ maps inputs to $\ell_2$ -normalized embeddings, yielding cosine similarities $s_{ij} := f(x_i)^\top f(\tilde x_j) \in [-1, 1]$ . The loss for one anchor is:

$L_i = -\log \left( \frac{\exp(s_{i,i^+}/\tau)}{\sum_{j=1}^N \exp(s_{i,j}/\tau)} \right)$

with $\tau>0$ the temperature parameter. $\tau$ rescales similarities before the softmax. Performance is highly sensitive to $\tau$ ; small $\tau$ increases “contrast” but risks vanishing gradients for moderate similarities, while large $\tau$ leaves nonzero gradient even at the optimum, necessitating expensive hyperparameter tuning.

2. Construction of Temperature-Free H-InfoNCE

To eliminate temperature dependence while preserving the ability of logits to take values across $(-\infty, +\infty)$ , the H-InfoNCE (Editor’s term) leverages a statistical log-odds transformation. For $s_{ij} \in (-1,1)$ , define $p_{ij} := (1+s_{ij})/2 \in (0,1)$ ; then,

$\logit(p_{ij}) = \log\frac{1+s_{ij}}{1-s_{ij}} = 2\,\arctanh(s_{ij})$

The H-InfoNCE loss is then specified as:

$L^{\mathrm{H-InfoNCE}} = \sum_{i=1}^{N} \left[ -\log \frac{\exp(2\,\arctanh(s_{i,i^+}))}{\sum_{j=1}^N \exp(2\,\arctanh(s_{i,j}))} \right]$

Or, equivalently:

$L_i = -2 \arctanh(s_{i, i^+}) + \log \sum_{j=1}^N \exp(2 \arctanh(s_{i,j}))$

Thus, $\tau$ is eliminated, each logit is mapped as $s_{ij} \mapsto 2 \arctanh(s_{ij})$, resulting in unbounded logits for more expressive contrast.

3. Gradient Properties and Analysis

H-InfoNCE’s theoretical advantage is evident in its gradient behavior. Under a toy model with one positive ( $s_+ = C$ ) and one negative ( $s_- = -C$ ), standard InfoNCE yields:

$\Bigl|\frac{\partial L}{\partial C}\Bigr| = \frac{2/\tau}{1+\exp(2C/\tau)}$

which can vanish for moderate $C$ when $\tau$ is small. In contrast, H-InfoNCE produces:

$L = \log(1 + e^{-2\arctanh(C)})$

$\Bigl|\frac{\partial L}{\partial C}\Bigr| = \frac{4(1-C)}{(1+C)\bigl((1-C)^2 + 4C\bigr)}$

This gradient is strictly positive for $C \in (-1,1)$ and vanishes only as $C \to 1$ , i.e., only at the optimum. There is no middle-range gradient collapse, and no hyperparameter to tune.

4. Theoretical Insights

The main theoretical claim is that finite logit ranges $[-1/\tau, 1/\tau]$ in standard InfoNCE induce undesirable gradient trade-offs: nonzero gradients near optimum (impairing convergence), or vanishing gradients in the middle (impairing learning). The arctanh mapping resolves this, producing:

Unbounded logits ( $(-\infty, \infty)$ ) allowing the softmax to approximate the “hard” one-hot distribution.
Gradient $\to 0$ only at the true optimum, supporting reliable gradient-based cross-entropy minimization. No global convergence proof or Lipschitz constant estimates are provided, but closed-form analysis supports these conclusions.

5. Empirical Performance

Extensive benchmarks across five domains compare H-InfoNCE (“Free”) against InfoNCE with $\tau \in \{0.1, 0.25, 0.5, 1.0\}$ . Representative results:

Task	Best InfoNCE ( $\tau$ )	H-InfoNCE (“Free”)
Imagenette kNN-1	84.43 (0.25)	84.65 (0.27)
CiteSeer F1 (micro/macro)	67.33/60.47 (0.50)	67.95/60.56
CIFAR-10 ROC-AUC (mean)	97.22 (0.25)	97.28
MABEL LM/ICAT (StereoSet)	80.6/71.3	81.0/71.7
DCRec HR@1	0.1336	0.1360

Across all tasks, H-InfoNCE matches or exceeds the best manually-tuned $\tau$ , without any hyperparameter search.

6. Implementation Practices

Image experiments use SimCLR with ResNet-18 (800 epochs, SGD, batch 256, standard augmentation); graph experiments employ GRACE with GCNs (1000 epochs, Adam); anomaly detection leverages ResNet-152 on CIFAR-10; NLP mitigation uses BERT-base (MABEL, 2 epochs, AdamW, batch 16); recommender experiments use DCRec (GNN, co-attention, batch 512, contrastive/regularizer weights as baseline). Standard dataset sizes and batch defaults are observed; no extensive ablation for batch or embedding dimension is reported.

7. Outlook and Potential Extensions

Temperature-Free H-InfoNCE offers a plug-in replacement for InfoNCE with robust gradient dynamics and total removal of temperature tuning across a wide spectrum of tasks. Its core innovation is replacing the scalar $1/\tau$ with the monotonic, unbounded $2\,\arctanh(\cdot)$ mapping. A plausible implication is that H-InfoNCE could generalize to other self-supervised paradigms (e.g., cross-modal, masked prediction), though empirical validation and further theoretical study (convergence guarantees under stochastic dynamics) remain open for future research (Kim et al., 29 Jan 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Temperature-Free Loss Function for Contrastive Learning (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Temperature-Free H-InfoNCE.