Kernel CurvSSL: Curvature in RKHS

Updated 28 November 2025

The paper introduces a self-supervised framework that augments redundancy reduction by explicitly aligning local curvature in an RKHS.
It computes discrete curvature from k-nearest neighbor graphs and aligns these metrics across augmented views using a Barlow Twins-style loss.
The integrated loss, combining embedding and curvature objectives, leads to measurable performance gains on MNIST and CIFAR-10.

Kernel CurvSSL is a self-supervised representation learning framework that augments conventional non-contrastive redundancy-reduction objectives with explicit curvature regularization in a reproducing kernel Hilbert space (RKHS). It extends the CurvSSL family by formulating discrete local curvature with respect to a kernel-induced geometry and aligns such curvature between augmented views using a Barlow Twins-style loss on curvature-derived statistics. This approach aims to enforce not only statistical invariance and redundancy reduction, but also consistency in the local manifold geometry of learned embeddings, thereby enriching the expressiveness and utility of the resulting representation space (Ghojogh et al., 21 Nov 2025).

1. Discrete Curvature in RKHS

Kernel CurvSSL treats projected embeddings $z_i \in \mathbb R^{d_z}$ as vertices whose local geometry is quantified via discrete curvature scores. For each embedding $z_i$ , the $k$ nearest neighbors $\{z_{i,a}\}_{a=1}^k$ are determined, and their displacements relative to $z_i$ are given by $\breve z_{i,a} = z_{i,a} - z_i$ . In the RKHS, with feature map $\phi$ and kernel $k(x, y) = \phi(x)^\top \phi(y)$ , the local Gram matrix of the neighborhood is

$[K_i]_{ab} = k(\breve z_{i,a}, \breve z_{i,b})\,.$

Normalization to unit diagonal (mimicking projection onto the unit hypersphere) is performed:

$K'_i = D_i^{-1/2} K_i D_i^{-1/2}, \quad D_i = \mathrm{diag}(K_i).$

The discrete curvature score at $z_i$ is defined by the sum of off-diagonal entries of $K'_i$ :

$\kappa_i = \sum_{a=1}^{k-1}\sum_{b=a+1}^k [K'_i]_{ab}\,.$

This score can also be expressed as

$\kappa_i = \frac{1}{2} (\mathbf{1}^\top K'_i \mathbf{1} - \mathrm{tr}(K'_i)),$

and, using eigenvalues $\{\lambda_j^{(i)}\}$ of $K'_i$ , as

$\kappa_i = \frac{1}{2}\left(\sum_{j=1}^k \lambda_j^{(i)} - k\right).$

2. Curvature-based Regularizer

To regularize the local geometry, Kernel CurvSSL computes curvature scores $c_i = \kappa_i(z_i)$ and $c'_i = \kappa_i(z'_i)$ for two stochastic augmentations per sample in a mini-batch of size $b$ . The curvature vectors are centered and variance-normalized:

$\tilde c = \frac{c - \mu_c \mathbf{1}}{\sigma_c + \varepsilon}, \quad \tilde c' = \frac{c' - \mu_c' \mathbf{1}}{\sigma_c' + \varepsilon}.$

A cross-correlation matrix is formed:

$M = \frac{1}{b} \tilde c \,\tilde c'^{T},$

whose diagonals correspond to matched samples and off-diagonals to mismatched ones. The curvature alignment loss is:

$\mathcal{L}_{\rm curv} = \sum_{i=1}^{b} (M_{ii} - 1)^2 + \lambda_{\rm curv} \sum_{i \neq j} M_{ij}^2.$

In matrix form:

$\mathcal{L}_{\rm curv} = \|\mathrm{diag}(M) - I\|_F^2 + \lambda_{\rm curv} \|M - \mathrm{diag}(M)\|_F^2.$

This regularizer enforces view-invariance of local curvature and discourages redundancies in curvature patterns across different samples.

3. Kernel CurvSSL Loss Function

The embedding-level redundancy-reduction loss used in Kernel CurvSSL is analogous to that in Barlow Twins, applied to projected features:

$\tilde z_i = \frac{z_i - \mu_z }{\sigma_z + \varepsilon}, \quad \tilde z'_i = \frac{z'_i - \mu_z'}{\sigma_z' + \varepsilon},$

with cross-correlation matrix

$C = \frac{1}{b}\sum_{i=1}^b \tilde z_i \tilde z_i'^{T}, \quad C_{uv} = \frac{1}{b}\sum_i \tilde z_{i,u} \tilde z_{i,v}'.$

The embedding loss is

$\mathcal{L}_{\rm emb} = \sum_{u=1}^{d_z}(C_{uu}-1)^2 + \lambda_{\rm emb} \sum_{u \neq v} C_{uv}^2.$

The total loss is a weighted sum of the embedding-level and curvature-based losses:

$\mathcal{L}_{\rm total} = \mathcal{L}_{\rm emb} + \alpha_{\rm curv} \mathcal{L}_{\rm curv}.$

Explicitly,

$\mathcal{L}_{\rm total} = \sum_{u}(C_{uu}-1)^2 + \lambda_{\rm emb}\sum_{u\neq v}C_{uv}^2 + \alpha_{\rm curv}\left[\sum_{i}(M_{ii}-1)^2 + \lambda_{\rm curv}\sum_{i\neq j}M_{ij}^2\right].$

4. Encoder–Projector Architecture and Training Workflow

Kernel CurvSSL employs a standard two-view self-supervised architecture composed of a shared encoder $f_\theta$ and a projection head $g$ . The typical training iteration proceeds as follows:

Sample a mini-batch $\{x_i\}_{i=1}^b$ .
Generate two independent stochastic augmentations for each $x_i$ , yielding $(x_i,x'_i)\sim\mathcal T$ .
Compute representations: $h_i=f_\theta(x_i)$ , $z_i=g(h_i)$ and $h_i'=f_\theta(x_i')$ , $z_i'=g(h_i')$ .
Compute per-view means and standard deviations over $\{z_i\},\{z'_i\}$ ; normalize to obtain $\tilde z_i, \tilde z'_i$ .
Form cross-correlation $C$ and evaluate $\mathcal{L}_{\rm emb}$ .
For each $i$ , find $k$ nearest neighbors by (Euclidean or kernel) distance among $\{z_j\}$ , form $K_i$ , normalize to $K_i'$ , compute $\kappa_i$ ; repeat for second view.
Stack curvature scores $\kappa$ and $\kappa'$ ; center and scale to obtain $\tilde c, \tilde c'$ .
Form $M$ and compute $\mathcal{L}_{\rm curv}$ .
Backpropagate $\nabla_\theta \mathcal{L}_{\rm total}$ through all differentiable operations. Neighbor selection is treated as fixed with respect to gradients; kernel evaluations $k(\cdot,\cdot)$ allow gradients to flow from $\mathcal{L}_{\rm curv}$ to parameters.

5. Geometric and Methodological Context

Non-contrastive self-supervised schemes such as Barlow Twins and VICReg enforce alignment of statistical moments (invariance, variance, covariance) but do not shape the local manifold geometry of embeddings. Kernel CurvSSL introduces an explicit mechanism for regularizing and aligning the bending—captured by neighborhood curvature—across augmentations. This ensures:

Local bending consistency: If two augmentations fall nearby in the learned space, they exhibit similar local curvature patterns in their neighbor graphs.
Collapse avoidance: By decorrelating curvature across samples, the approach mitigates degenerate solutions where the underlying manifold lacks structural diversity.
Nonlinear geometric capture: The RKHS formulation enables modeling of complex, nonlinear local structures, enriching the representation beyond what Euclidean relationships provide.

These considerations underscore a shift from purely statistical regularization towards methods that leverage more sophisticated geometric priors.

6. Experimental Results and Comparative Performance

Kernel CurvSSL was evaluated using ResNet-18 encoders and 2-layer MLP projectors with output dimension $d_z=128$ on MNIST and CIFAR-10, using 100 and 500 pretraining epochs, respectively. Batch size, neighborhood size, and key hyperparameters were set as $b=256$ , $k=10$ , $\lambda_{\rm emb}=1$ , $\lambda_{\rm curv}=1$ , $\alpha_{\rm curv}=1$ , with RBF kernel for kernel CurvSSL.

Standard augmentations were employed: random crop/resize + small rotation (MNIST), and random crop + flip + color jitter + grayscale (CIFAR-10). Optimization used Adam with learning rate $10^{-3}$ and weight decay $10^{-4}$ . For evaluation, the encoder was frozen, and a linear classifier was trained for 50 epochs (SGD).

Method	MNIST Top-1 (%)	CIFAR-10 Top-1 (%)
VICReg	95.9	74.5
Barlow Twins	94.9	73.6
CurvSSL (Euclid)	97.9	75.1
Kernel CurvSSL	98.4	76.5

Kernel CurvSSL outperformed both statistical regularizers and its Euclidean variant by 1–2 points (CIFAR-10) and 2–3 points (MNIST) in linear probe accuracy, indicating the efficacy of explicitly regularizing local geometry in RKHS as a complement to standard SSL approaches (Ghojogh et al., 21 Nov 2025).

PDF Markdown Chat (Pro)

References (1)

Self-Supervised Learning by Curvature Alignment (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Kernel CurvSSL.