Double-Constant Embedding Model (CCEM)

Updated 15 February 2026

Double-Constant Embedding Model (CCEM) is a geometric framework that characterizes optimal sigmoid-based contrastive learning embeddings using a one-parameter family.
The method leverages an ETF-based construction to ensure all positive inner products equal c1 and negatives equal c2, reducing high-dimensional optimization to a single-variable minimization.
Synthetic experiments and phase transition analysis validate that tuning the temperature parameter effectively interpolates between equiangular tight frames and antipodal configurations.

The Double-Constant Embedding Model (CCEM) is a geometric framework devised to analyze and parameterize optimal embedding structures arising in sigmoid-based contrastive learning objectives, specifically in the context of recent models such as SigLIP. Its core insight is that, under the sigmoid loss, optimal embeddings for positive and negative pairs conform to a "double-constant" structure, which can be parameterized as a one-parameter family interpolating between equiangular tight frames and antipodal configurations. This substantially reduces the characterization of the global optimum from a high-dimensional search to a tractable one-variable minimization, with profound implications for understanding embedding geometry, phase transitions, and practical algorithm selection (Lee et al., 2024).

1. Contrastive Learning Problem and Sigmoid Loss

The setting for CCEM analysis considers $N$ "positive" pairs $\{(\vec{x}_i, \vec{y}_i)\}_{i=1}^N$ in $\mathbb{R}^d$ , constrained to lie on the unit sphere ( $\|\vec{x}_i\| = \|\vec{y}_i\| = 1$ ). Each $(i,i)$ entry is treated as a positive pair; all other cross-pairs $(i,j)$ with $i \neq j$ are "negative." The loss of interest is the sigmoid contrastive loss (employed in SigLIP), given by:

$L_{\text{sig}}(X, Y) = -\frac{1}{N} \sum_{i=1}^N \log\frac{1}{1 + \exp(-t \vec{x}_i^\top \vec{y}_i + b)} - \frac{1}{N} \sum_{i=1}^N \sum_{j \neq i} \log\frac{1}{1 + \exp(t \vec{x}_i^\top \vec{y}_j - b)}$

with temperature $t > 0$ and bias $b \geq 0$ . The objective is to identify

$(X^*, Y^*) = \arg\min_{\{\vec{x}_i, \vec{y}_i\} \subset S^{d-1}} L_{\text{sig}}(X, Y)$

This loss structure is characteristic of contrastive learning frameworks that use the sigmoid criterion, differentiating them from InfoNCE-based approaches.

2. Structure and Parameterization of the Double-Constant Embedding Model

The central result motivating CCEM is that any optimal configuration $(X^*, Y^*)$ can, without loss of generality, be taken to satisfy the double-constant property: all positive inner products $\vec{x}_i^\top \vec{y}_i$ are equal to $c_1$ ; all negative inner products $\vec{x}_i^\top \vec{y}_j$ for $i \neq j$ are equal to $c_2$ .

This structure admits a parameterization via a single non-negative scalar $\delta$ :

Let $\{\widetilde{x}_i\}_{i=1}^N \subset \mathbb{R}^{d-1}$ be an $(N-1)$ -simplex equiangular tight frame (ETF), i.e., $\|\widetilde{x}_i\|=1$ and $\widetilde{x}_i^\top \widetilde{x}_j = -1/(N-1)$ for $i \neq j$ .
The embeddings in $\mathbb{R}^d$ are constructed by appending $\pm\delta$ to each ETF vector and renormalizing:

$\vec{x}_i(\delta) = \frac{1}{\sqrt{1+\delta^2}} \begin{pmatrix} \widetilde{x}_i \ \delta \end{pmatrix}, \quad \vec{y}_i(\delta) = \frac{1}{\sqrt{1+\delta^2}} \begin{pmatrix} \widetilde{x}_i \ -\delta \end{pmatrix}$

for $i=1,\dots,N$ .

From this construction:

Positive pairwise inner products: $\vec{x}_i^\top \vec{y}_i = \frac{1-\delta^2}{1+\delta^2}$
Negative pairwise inner products ( $i \neq j$ ): $\vec{x}_i^\top \vec{y}_j = -\frac{1/(N-1) + \delta^2}{1+\delta^2}$

As $\delta \to 0$ , both embeddings coincide and recover the ETF structure. As $\delta \to \infty$ , this yields the "antipodal" configuration where $\vec{x}_i = -\vec{y}_i$ .

3. Theoretical Justification for Sufficiency of CCEM

The double-constant property is proved to be a necessary optimum under very mild loss landscape conditions: the loss must be a sum of a convex decreasing function for all positive inner products, and a convex increasing function for all negative inner products. Constrained to this property, the optimization problem over $N \cdot d$ variables is reduced to a one-dimensional search along the CCEM curve $X(\delta), Y(\delta)$ .

A two-step application of Jensen's inequality demonstrates that, for any fixed mean positive inner product $c_1$ , the CCEM construction minimizes the aggregate negative similarities. Therefore, the global optimum of $L_{\text{sig}}$ must exist along the one-dimensional CCEM manifold parameterized by $\delta$ .

4. Closed-Form Analysis, Phase Transition, and Embedding Geometry

The optimization of $L_{\text{sig}}$ thus reduces to:

$\delta^* = \arg\min_{\delta \geq 0} L_{\text{sig}}(X(\delta), Y(\delta))$

For bias parameter $b = t$ , closed-form boundaries are established:

For $N=3$ : the minimizer is always $\delta^* = 0$ (simplex ETF) for all $t>0$ .
For $N \geq 4$ $N \geq 4$ :
- $\delta^* = 0$ (ETF) when $t > \frac{N-1}{N} \log(N-3)$ .
- $\delta^* = \infty$ (antipodal) when $t < \frac{1}{2} \log\left(\frac{N-2}{2}\right)$ .
- For intermediate values, $\delta^*$ varies continuously, producing a one-parameter family of embeddings interpolating between ETF and antipodal structure.

This defines a phase transition in embedding geometry as temperature $t$ is varied—high temperatures favor maximally uniform ETF alignment (positives aligned, negatives equiangular); low temperatures collapse to antipodal structure (positives maximally opposed, negatives coincident).

Regime	$\delta^*$	Geometric Configuration
$t \gg \log N$	$0$	Equiangular tight frame (ETF)
$t \lesssim \frac{1}{2}\log N$	$\infty$	Antipodal (collapse)
Intermediate	$0 < \delta^* < \infty$	One-parameter interpolation

5. Synthetic Experiments and Empirical Validation

Threshold predictions for phase boundaries were validated via synthetic experiments, directly optimizing $\{\vec{x}_i, \vec{y}_i\}$ on the sphere, and training a two-layer neural network. The normalized positive-pair similarity

$s = \frac{1}{2}\left(1 + \frac{1}{N} \sum_{i=1}^N \vec{x}_i^\top \vec{y}_i\right)$

was used as an order parameter. Empirical findings across various $N$ and $d$ (e.g., $N = 10, 20, 50$ , $d=N$ or $d=N/2$ ) demonstrated abrupt transitions in $s(t, N)$ at $t \approx ((N-1)/N)\log(N-3)$ (ETF threshold) and $t \approx \frac{1}{2}\log((N-2)/2)$ (antipodal threshold), confirming theoretical predictions.

6. Geometric and Practical Implications for Contrastive Learning

CCEM characterizes a continuous family of embedding geometries, controlled by the sigmoid temperature $t$ , which mediates the balance between aligning positives and repelling negatives. In the context of large-scale models (e.g., SigLIP), there is a practical requirement to set $t \gg O(\log N)$ to ensure ETF-like structures and avoid collapse into the antipodal regime, a pathology where embedding structure degenerates. This requirement elucidates why SigLIP employs relatively large temperatures to match the performance of CLIP's InfoNCE loss, with the added benefit of computational efficiency.

A plausible implication is that this framework provides a principled guiding criterion for parameter selection in sigmoid-based contrastive models and explains observed behaviors in embedding geometry as loss parameters are varied (Lee et al., 2024).

Markdown Report Issue Upgrade to Chat

References (1)

Analysis of Using Sigmoid Loss for Contrastive Learning (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Double-Constant Embedding Model (CCEM).