Spectral Ghost in Representation Learning: from Component Analysis to Self-Supervised Learning

Published 28 Jan 2026 in cs.LG | (2601.20154v1)

Abstract: Self-supervised learning (SSL) have improved empirical performance by unleashing the power of unlabeled data for practical applications. Specifically, SSL extracts the representation from massive unlabeled data, which will be transferred to a plenty of down streaming tasks with limited data. The significant improvement on diverse applications of representation learning has attracted increasing attention, resulting in a variety of dramatically different self-supervised learning objectives for representation extraction, with an assortment of learning procedures, but the lack of a clear and unified understanding. Such an absence hampers the ongoing development of representation learning, leaving a theoretical understanding missing, principles for efficient algorithm design unclear, and the use of representation learning methods in practice unjustified. The urgency for a unified framework is further motivated by the rapid growth in representation learning methods. In this paper, we are therefore compelled to develop a principled foundation of representation learning. We first theoretically investigate the sufficiency of the representation from a spectral representation view, which reveals the spectral essence of the existing successful SSL algorithms and paves the path to a unified framework for understanding and analysis. Such a framework work also inspires the development of more efficient and easy-to-use representation learning algorithms with principled way in real-world applications.

Abstract PDF Upgrade to Chat

Summary

The paper establishes that singular functions of the conditional operator provide sufficient representations for any downstream prediction task.
It unifies diverse SSL algorithms—including contrastive, non-contrastive, and energy-based methods—by interpreting them as estimators of spectral decompositions.
The analysis offers practical insights on scalability and optimization, revealing how unbiased gradients and low-rank representations impact performance.

Spectral Foundations of Representation Learning: A Unified Theory

Introduction

"Spectral Ghost in Representation Learning: from Component Analysis to Self-Supervised Learning" (2601.20154) develops a rigorous theoretical framework for understanding the foundations of self-supervised representation learning (SSL) through the lens of spectral analysis and operator theory. The manuscript addresses the central gaps in current SSL research: lack of principled unification across diverse algorithmic paradigms, unclear sufficiency criteria for learned representations, and absence of systematic tools for downstream task performance analysis. The spectral framework presented connects classical statistical embedding methods—such as PCA, LPP, and CCA—with contemporary practice in contrastive and non-contrastive SSL. The paper offers precise theoretical criteria for "sufficient" representations, synthesizes objectives from classical spectral and energy-based formulations, and delineates the practical algorithmic implications, especially regarding optimization and scalability in large-scale SSL contexts.

Spectral Sufficiency of Representations

The paper formally establishes that for any prediction task $\mathbb{E}[y|x]$ , sufficient representations are the singular functions (left or right) of the conditional operator $\mathbb{P}(y|x)$ —a spectral perspective inherited from operator theory and component analysis. The singular value decomposition (SVD) of the conditional operator, $\mathbb{P}(y|x) = \langle \phi(x), \mu(y) \rangle$ , determines the minimal—yet maximally expressive—subspaces for representing all conditional expectations. This immediately connects with, and generalizes, classical results concerning the sufficiency of "principal components" or canonical variates in regression and classification problems.

A major theoretical result is that the spectral basis $\phi(x)$ (and its generalization $\psi(x)$ formed by the union of all bases for all potential conditional tasks) is provably sufficient to represent any downstream target—possibly through a downstream linear model in the space of representations. The spectral framework thus resolves ambiguities about task sufficiency and links representing all possible downstream conditionals to a single representation learned using only data-driven spectral structure (i.e., unlabeled data).

Unified Framework for SSL Algorithms

By analyzing prevailing SSL objectives through this spectral lens, the paper rigorously classifies and interrelates contrastive (e.g., InfoNCE/SimCLR), non-contrastive (e.g., BYOL, Barlow Twins, VICReg), energy-based, and latent-variable approaches:

Contrastive objectives (e.g., square contrastive loss, SimCLR) enforce similarity of representations for positive pairs and repulsion for negatives; these are shown to be particular statistical estimators of the spectral operator associated with $\mathbb{P}(x'|x)$ .
Non-contrastive objectives (e.g., BYOL, MINC) are shown to implement stochastic (block) power iteration towards the principal spectral directions by recursive applications of fixed-point equations.
Variance-invariance-covariance regularization (VICReg) and Barlow Twins are interpreted as regularized objectives enforcing decorrelation (spectral orthogonality) and invariance, bridging them directly to classical decorrelated spectral embedding techniques.
Energy-based models (EBMs) are shown to produce infinite-dimensional feature expansions (cf. random feature representations) capturing nonlinear interactions, and the optimal downstream estimator becomes a kernel machine.

Crucially, the theory exposes that so-called "contrastive vs. non-contrastive" divides are superficial: the underlying spectral target is identical, while apparent algorithmic differences stem from how they estimate or regularize the spectrum and the statistical bias/variance trade-offs entailed by their batch-based approximations. Strong claims are made regarding the stochastic gradient compatibility of different objectives.

Algorithmic Design, Expressiveness, and Optimization

The paper provides a detailed analysis of the practical implications of the spectral viewpoint for SSL algorithms, emphasizing optimization and estimator bias. The authors show:

SimCLR/InfoNCE corresponds to ranking-based noise contrastive estimation (NCE) on an EBM parameterized exponential kernel; while these estimators are conceptually unbiased, they require large batch sizes to overcome inherent sample bias of the log-sum-exp estimator.
Square contrastive and binary NCE objectives allow for unbiased stochastic gradients even under modest batch sizes, favoring scalability.
Non-contrastive approaches like BYOL/MINC implement power iteration consistent with spectral decomposition but their optimization is unbiased and fully compatible with stochastic gradients—a significant practical insight for scalability in distributed contexts.

The analysis of energy-based and latent-variable parameterizations clarifies their expressiveness: spectral representations with linear parameterizations are limited in rank, whereas EBMs and implicit models realize infinite-rank spectral decompositions, albeit at the cost of nonlinear downstream predictors and optimization complexity.

An explicit tradeoff is identified: expressive, nonlinear energy-based or deep latent variable models afford greater flexibility at the cost of increased complexity in designing optimal downstream tasks, which for spectral methods remain linear and efficiently learnable.

Theoretical and Practical Implications

The theoretical synthesis results in several strong positions:

All mainstream SSL approaches—contrastive, non-contrastive, energy-based, and latent variable—are, in principle, optimizing for (possibly regularized) components of the spectral decomposition of a data-generating operator.
The observed empirical gaps between algorithms are primarily determined by parameterization choices (e.g., low-rank, kernel-induced, or energy-based) and by the bias/variance properties of the associated optimization procedures, not by foundational differences in information captured by contrastive vs. non-contrastive mechanisms.
For multimodal representation learning (e.g., CLIP), the framework recovers not just alignment-based objectives but also shows how learned representations retain spectral sufficiency for optimal downstream generative and discriminative tasks.
The authors connect modern SSL research to a suite of classical component analysis and manifold learning techniques—PCA, LPP, CCA, SNE, NCA, k-means, Isomap—demonstrating that manifold learning, clustering algorithms, and SSL objectives are all specific regularizations and estimators for the spectral ghost underlying pairwise data distributions.

Extensions: Applications and Future Directions

The spectral view provides both a theoretical and algorithmic roadmap for multiple current and emergent applications:

Causal inference: The spectral approach has direct utility in identifying structural relationships where unobserved confounders are present—e.g., via instrumental variables and proxy structures—by leveraging operator decompositions to recover causal effects using only observables [sun2024spectral].
Reinforcement learning: Spectral representations enable efficient planning and uncertainty quantification in MDPs and POMDPs via spectral-decomposable transition operators, unifying model-based and model-free paradigms [ren2022spectral, zhang2022making, gao2025spectral].
Controllable synthesis and generative modeling: The kernel/energy-based and latent variable views subsume recent advances in diffusion models, offering consistent spectral parameterizations for generative tasks.
Scalability: The stochastic compatibility and unbiased gradient properties of spectral-based algorithms provide an avenue for efficient and large-scale SSL beyond the heavy batch constraints of legacy contrastive approaches.

The paper identifies open research frontiers in ablation of parameterization and optimization choices, especially on the question of which spectral factorization regimes yield optimal empirical performance under statistical and computational constraints.

Conclusion

This work establishes the spectral foundation as the unifying principle in modern representation learning, encompassing and connecting SSL, classical component analysis, manifold methods, and energy-based/fenchel-dual representations. The spectral lens not only offers precise sufficiency characterizations, but also enables comparison and design of scalable, unbiased, and expressive algorithms for SSL. As such, it provides both a theoretical toolkit and practical guidance for advancing the field, as well as for transferring these principles to related domains in causal discovery, generative modeling, and sequential decision-making.

References:

(2601.20154): "Spectral Ghost in Representation Learning: from Component Analysis to Self-Supervised Learning" sun2024spectral: "Spectral representation for causal estimation with hidden confounders" ren2022spectral: "Spectral decomposition representation for reinforcement learning" zhang2022making: "Making linear mdps practical via contrastive representation learning" gao2025spectral: "Spectral Representation-based Reinforcement Learning"

Markdown