Papers
Topics
Authors
Recent
2000 character limit reached

Kernel CurvSSL: Curvature in RKHS

Updated 28 November 2025
  • The paper introduces a self-supervised framework that augments redundancy reduction by explicitly aligning local curvature in an RKHS.
  • It computes discrete curvature from k-nearest neighbor graphs and aligns these metrics across augmented views using a Barlow Twins-style loss.
  • The integrated loss, combining embedding and curvature objectives, leads to measurable performance gains on MNIST and CIFAR-10.

Kernel CurvSSL is a self-supervised representation learning framework that augments conventional non-contrastive redundancy-reduction objectives with explicit curvature regularization in a reproducing kernel Hilbert space (RKHS). It extends the CurvSSL family by formulating discrete local curvature with respect to a kernel-induced geometry and aligns such curvature between augmented views using a Barlow Twins-style loss on curvature-derived statistics. This approach aims to enforce not only statistical invariance and redundancy reduction, but also consistency in the local manifold geometry of learned embeddings, thereby enriching the expressiveness and utility of the resulting representation space (Ghojogh et al., 21 Nov 2025).

1. Discrete Curvature in RKHS

Kernel CurvSSL treats projected embeddings zi∈Rdzz_i \in \mathbb R^{d_z} as vertices whose local geometry is quantified via discrete curvature scores. For each embedding ziz_i, the kk nearest neighbors {zi,a}a=1k\{z_{i,a}\}_{a=1}^k are determined, and their displacements relative to ziz_i are given by z˘i,a=zi,a−zi\breve z_{i,a} = z_{i,a} - z_i. In the RKHS, with feature map ϕ\phi and kernel k(x,y)=ϕ(x)⊤ϕ(y)k(x, y) = \phi(x)^\top \phi(y), the local Gram matrix of the neighborhood is

[Ki]ab=k(z˘i,a,z˘i,b) .[K_i]_{ab} = k(\breve z_{i,a}, \breve z_{i,b})\,.

Normalization to unit diagonal (mimicking projection onto the unit hypersphere) is performed:

Ki′=Di−1/2KiDi−1/2,Di=diag(Ki).K'_i = D_i^{-1/2} K_i D_i^{-1/2}, \quad D_i = \mathrm{diag}(K_i).

The discrete curvature score at ziz_i is defined by the sum of off-diagonal entries of Ki′K'_i:

κi=∑a=1k−1∑b=a+1k[Ki′]ab .\kappa_i = \sum_{a=1}^{k-1}\sum_{b=a+1}^k [K'_i]_{ab}\,.

This score can also be expressed as

κi=12(1⊤Ki′1−tr(Ki′)),\kappa_i = \frac{1}{2} (\mathbf{1}^\top K'_i \mathbf{1} - \mathrm{tr}(K'_i)),

and, using eigenvalues {λj(i)}\{\lambda_j^{(i)}\} of Ki′K'_i, as

κi=12(∑j=1kλj(i)−k).\kappa_i = \frac{1}{2}\left(\sum_{j=1}^k \lambda_j^{(i)} - k\right).

2. Curvature-based Regularizer

To regularize the local geometry, Kernel CurvSSL computes curvature scores ci=κi(zi)c_i = \kappa_i(z_i) and ci′=κi(zi′)c'_i = \kappa_i(z'_i) for two stochastic augmentations per sample in a mini-batch of size bb. The curvature vectors are centered and variance-normalized:

c~=c−μc1σc+ε,c~′=c′−μc′1σc′+ε.\tilde c = \frac{c - \mu_c \mathbf{1}}{\sigma_c + \varepsilon}, \quad \tilde c' = \frac{c' - \mu_c' \mathbf{1}}{\sigma_c' + \varepsilon}.

A cross-correlation matrix is formed:

M=1bc~ c~′T,M = \frac{1}{b} \tilde c \,\tilde c'^{T},

whose diagonals correspond to matched samples and off-diagonals to mismatched ones. The curvature alignment loss is:

Lcurv=∑i=1b(Mii−1)2+λcurv∑i≠jMij2.\mathcal{L}_{\rm curv} = \sum_{i=1}^{b} (M_{ii} - 1)^2 + \lambda_{\rm curv} \sum_{i \neq j} M_{ij}^2.

In matrix form:

Lcurv=∥diag(M)−I∥F2+λcurv∥M−diag(M)∥F2.\mathcal{L}_{\rm curv} = \|\mathrm{diag}(M) - I\|_F^2 + \lambda_{\rm curv} \|M - \mathrm{diag}(M)\|_F^2.

This regularizer enforces view-invariance of local curvature and discourages redundancies in curvature patterns across different samples.

3. Kernel CurvSSL Loss Function

The embedding-level redundancy-reduction loss used in Kernel CurvSSL is analogous to that in Barlow Twins, applied to projected features:

z~i=zi−μzσz+ε,z~i′=zi′−μz′σz′+ε,\tilde z_i = \frac{z_i - \mu_z }{\sigma_z + \varepsilon}, \quad \tilde z'_i = \frac{z'_i - \mu_z'}{\sigma_z' + \varepsilon},

with cross-correlation matrix

C=1b∑i=1bz~iz~i′T,Cuv=1b∑iz~i,uz~i,v′.C = \frac{1}{b}\sum_{i=1}^b \tilde z_i \tilde z_i'^{T}, \quad C_{uv} = \frac{1}{b}\sum_i \tilde z_{i,u} \tilde z_{i,v}'.

The embedding loss is

Lemb=∑u=1dz(Cuu−1)2+λemb∑u≠vCuv2.\mathcal{L}_{\rm emb} = \sum_{u=1}^{d_z}(C_{uu}-1)^2 + \lambda_{\rm emb} \sum_{u \neq v} C_{uv}^2.

The total loss is a weighted sum of the embedding-level and curvature-based losses:

Ltotal=Lemb+αcurvLcurv.\mathcal{L}_{\rm total} = \mathcal{L}_{\rm emb} + \alpha_{\rm curv} \mathcal{L}_{\rm curv}.

Explicitly,

Ltotal=∑u(Cuu−1)2+λemb∑u≠vCuv2+αcurv[∑i(Mii−1)2+λcurv∑i≠jMij2].\mathcal{L}_{\rm total} = \sum_{u}(C_{uu}-1)^2 + \lambda_{\rm emb}\sum_{u\neq v}C_{uv}^2 + \alpha_{\rm curv}\left[\sum_{i}(M_{ii}-1)^2 + \lambda_{\rm curv}\sum_{i\neq j}M_{ij}^2\right].

4. Encoder–Projector Architecture and Training Workflow

Kernel CurvSSL employs a standard two-view self-supervised architecture composed of a shared encoder fθf_\theta and a projection head gg. The typical training iteration proceeds as follows:

  • Sample a mini-batch {xi}i=1b\{x_i\}_{i=1}^b.
  • Generate two independent stochastic augmentations for each xix_i, yielding (xi,xi′)∼T(x_i,x'_i)\sim\mathcal T.
  • Compute representations: hi=fθ(xi)h_i=f_\theta(x_i), zi=g(hi)z_i=g(h_i) and hi′=fθ(xi′)h_i'=f_\theta(x_i'), zi′=g(hi′)z_i'=g(h_i').
  • Compute per-view means and standard deviations over {zi},{zi′}\{z_i\},\{z'_i\}; normalize to obtain z~i,z~i′\tilde z_i, \tilde z'_i.
  • Form cross-correlation CC and evaluate Lemb\mathcal{L}_{\rm emb}.
  • For each ii, find kk nearest neighbors by (Euclidean or kernel) distance among {zj}\{z_j\}, form KiK_i, normalize to Ki′K_i', compute κi\kappa_i; repeat for second view.
  • Stack curvature scores κ\kappa and κ′\kappa'; center and scale to obtain c~,c~′\tilde c, \tilde c'.
  • Form MM and compute Lcurv\mathcal{L}_{\rm curv}.
  • Backpropagate ∇θLtotal\nabla_\theta \mathcal{L}_{\rm total} through all differentiable operations. Neighbor selection is treated as fixed with respect to gradients; kernel evaluations k(â‹…,â‹…)k(\cdot,\cdot) allow gradients to flow from Lcurv\mathcal{L}_{\rm curv} to parameters.

5. Geometric and Methodological Context

Non-contrastive self-supervised schemes such as Barlow Twins and VICReg enforce alignment of statistical moments (invariance, variance, covariance) but do not shape the local manifold geometry of embeddings. Kernel CurvSSL introduces an explicit mechanism for regularizing and aligning the bending—captured by neighborhood curvature—across augmentations. This ensures:

  • Local bending consistency: If two augmentations fall nearby in the learned space, they exhibit similar local curvature patterns in their neighbor graphs.
  • Collapse avoidance: By decorrelating curvature across samples, the approach mitigates degenerate solutions where the underlying manifold lacks structural diversity.
  • Nonlinear geometric capture: The RKHS formulation enables modeling of complex, nonlinear local structures, enriching the representation beyond what Euclidean relationships provide.

These considerations underscore a shift from purely statistical regularization towards methods that leverage more sophisticated geometric priors.

6. Experimental Results and Comparative Performance

Kernel CurvSSL was evaluated using ResNet-18 encoders and 2-layer MLP projectors with output dimension dz=128d_z=128 on MNIST and CIFAR-10, using 100 and 500 pretraining epochs, respectively. Batch size, neighborhood size, and key hyperparameters were set as b=256b=256, k=10k=10, λemb=1\lambda_{\rm emb}=1, λcurv=1\lambda_{\rm curv}=1, αcurv=1\alpha_{\rm curv}=1, with RBF kernel for kernel CurvSSL.

Standard augmentations were employed: random crop/resize + small rotation (MNIST), and random crop + flip + color jitter + grayscale (CIFAR-10). Optimization used Adam with learning rate 10−310^{-3} and weight decay 10−410^{-4}. For evaluation, the encoder was frozen, and a linear classifier was trained for 50 epochs (SGD).

Method MNIST Top-1 (%) CIFAR-10 Top-1 (%)
VICReg 95.9 74.5
Barlow Twins 94.9 73.6
CurvSSL (Euclid) 97.9 75.1
Kernel CurvSSL 98.4 76.5

Kernel CurvSSL outperformed both statistical regularizers and its Euclidean variant by 1–2 points (CIFAR-10) and 2–3 points (MNIST) in linear probe accuracy, indicating the efficacy of explicitly regularizing local geometry in RKHS as a complement to standard SSL approaches (Ghojogh et al., 21 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Kernel CurvSSL.