Spectral Bellman Representation

Updated 19 July 2025

Spectral Bellman representation is a framework that combines spectral analysis and Bellman operators to analyze and learn solutions in dynamic programming and stochastic control.
It leverages eigenstructure and feature covariance to ensure that value function approximations remain within a closed, efficiently learnable function space.
This approach enhances algorithm efficiency and stability in reinforcement learning and optimal control, enabling robust exploration and improved policy evaluation.

The spectral Bellman representation is a theoretical and algorithmic framework for analyzing, parameterizing, and learning solutions to Bellman equations via their spectral (eigenstructure, covariance, or decomposition) properties. It combines spectral methods from functional and operator theory with the structure of Bellman operators as they arise in stochastic control, reinforcement learning, and Hamilton–Jacobi–Bellman (HJB) equations. This representation emphasizes the alignment between the feature covariance (or the eigenstructure of associated operators) and the action of the Bellman operator, leading to advances in the approximation of value functions, efficient reinforcement learning algorithms, and high-dimensional control.

1. Foundational Principles: Bellman Equations and Spectral Analysis

Bellman equations express optimality principles in dynamic programming and stochastic control. In continuous state spaces, these equations take the form of partial differential, integro-partial differential, or difference equations—often fully nonlinear or nonlocal, as in HJB or Q-function recursions.

Spectral analysis, in this context, concerns the paper of the eigenvalues and eigenfunctions (or, in data-driven regimes, the covariance structure) of operators associated with the Bellman equation. For many nonlocal operators (including those with jump processes), the spectral decomposition provides insight into regularity, stability, and the convergence properties of the solution. For value-based reinforcement learning, the feature covariance matrix plays an analogous role, encapsulating the spread and alignment of the learned representation relative to the Bellman operator’s effect (Gong et al., 2017, Chang et al., 2022, Nabati et al., 17 Jul 2025).

A central object is the (possibly nonlocal) Bellman operator $\mathcal{T}$ , which, when applied to a function $Q$ , is often defined as: $\mathcal{T} Q(s, a) = r(s, a) + \gamma \,\mathbb{E}_{s'\sim P(\cdot|s,a),\,a'\sim \pi(s')}[ Q(s', a')]$ Spectral analysis investigates how $\mathcal{T}$ acts on a subspace of functions, such as those spanned by a set of features.

2. Inherent Bellman Error and Covariance Closure

A key technical advance in spectral Bellman representations is the quantification of when a chosen feature space $\phi$ is “closed” under the action of the Bellman operator. This is formally captured by the Inherent Bellman Error (IBE) condition: $\mathcal{I}_\phi = \sup_{Q \in \mathcal{Q}_\phi} \inf_{\widetilde{Q} \in \mathcal{Q}_\phi} \| \mathcal{T}Q - \widetilde{Q} \|_\infty$ where $\mathcal{Q}_\phi$ is the linear function space spanned by $\phi$ . The case $\mathcal{I}_\phi = 0$ (zero IBE) is crucial: it implies that after applying the Bellman backup, the result remains in the span of $\phi$ .

Under the zero-IBE condition, the transformation of any $Q_\theta(s, a) = \phi(s, a)^\top \theta$ by $\mathcal{T}$ results in another element in the same function space, and this transformation is determined by the covariance matrix of the features. Specifically, the spectral decomposition of the feature covariance matrix governs the action of the Bellman operator on the representation (Nabati et al., 17 Jul 2025, Chang et al., 2022).

3. Spectral Relationships in Operator and Feature Spaces

The spectral Bellman representation hinges on a mathematically explicit connection between the covariance structure of the feature space and the action of the Bellman operator. Let

$\Lambda_1 = \mathbb{E}_{(s,a) \sim \rho} \left[ \phi(s,a) \phi(s,a)^\top \right]$

denote the feature covariance matrix under an appropriate distribution $\rho$ .

Through singular value decomposition (SVD), the transformation effected by the Bellman operator on a distribution of value functions is governed by the principal directions (eigenvectors) and scales (eigenvalues) of $\Lambda_1$ . The alignment of the learned features with these eigen-directions ensures that the backup (i.e., the Bellman update) does not “escape” the representation space, enabling efficient learning and uncertainty quantification (Nabati et al., 17 Jul 2025).

In the context of spectral Barron spaces, spectral norms and Fourier analysis underlie the approximation properties, ensuring high regularity of the solution and the feasibility of dimension-robust neural approximations (Feng et al., 24 Mar 2025).

4. Algorithms and Representation Learning

Several algorithmic frameworks leverage spectral Bellman representations:

Bilevel Optimization for Bellman Completeness: To ensure that the learned features are approximately closed under Bellman backups, a bilevel objective is employed. The inner objective finds linear parameters that minimize the Bellman residual, while the outer objective adjusts the feature map to maximize coverage (via spectral regularization such as maximizing $\log \det \Sigma(\phi)$ ) and enforce Bellman completeness (Chang et al., 2022).
Spectral Bellman Method (SBM): Alternates between optimizing policy/value parameters and jointly learning representations and associated parameter mappings that align with the Bellman-induced covariance structure. Orthogonality constraints and moving average estimates are used to stabilize learning, and the method readily extends to multi-step operators (Nabati et al., 17 Jul 2025).
Representation Rank Regularization: Theoretical connections between the Bellman equation and the cosine similarity of consecutive features motivate adaptive rank regularization (BEER regularizer), penalizing excessive alignment across feature vectors when the Bellman-induced upper bound is violated. This yields controlled expressiveness in deep Q-networks (He et al., 19 Apr 2024).
Diffusion Models for Successor State Measures: Diffusion-based generative models are adapted to represent the successor state measure under Bellman flow constraints, with KL-divergence losses decomposed over diffusion steps and deterministic target networks employed for low-variance updates (Schramm et al., 16 Jul 2024).

5. Applications and Empirical Results

Spectral Bellman representations have found application in areas such as:

Offline Policy Evaluation (OPE): Bellman-complete, spectrally regularized features enable stable and accurate OPE—often outperforming previously used contrastive or reconstruction-based representations, especially in high-dimensional control and out-of-distribution evaluation (Chang et al., 2022).
Exploration in RL: The feature covariance derived from the Spectral Bellman Method is used to quantify uncertainty for structured exploration. Thompson Sampling with this learned structure leads to superior performance in challenging RL environments with sparse rewards and long credit-assignment paths (e.g., Atari hard-exploration tasks) (Nabati et al., 17 Jul 2025).
Optimal Control and HJB Equations: The spectral Bellman framework connects robustly with solution theory for both local and nonlocal (integro-PDE) Bellman equations, with spectral decomposition aiding in numerical PDE approaches. Notably, in spectral Barron spaces, value functions and feedback laws for high-dimensional HJB equations are shown to be approximable by two-layer neural networks with convergence guarantees independent of dimension (Feng et al., 24 Mar 2025, Gong et al., 2017).

6. Theoretical Framework and Uniqueness Results

A rigorous theoretical underpinning has been established for spectral Bellman representations in various settings:

Stochastic Representation: Value functions for nonlocal HJB equations can be represented as expectations over controlled diffusion and jump processes, with spectral properties arising from the operator decomposition. Viscosity solution frameworks anchor the existence and uniqueness of solutions, accommodating degeneracy and nonlocality (Gong et al., 2017, Criens et al., 2023).
Faithful Parameterizations and Infinite Time Horizon: Using convex duality and representation theory, the spectrum of the Bellman (Hamiltonian) operator is linked to geometric and analytic properties of the problem, ensuring uniqueness (when solutions vanish at infinity) and providing a spectral lens on long-run ergodic behavior of value functions (Basco, 2022).
Convergence and Dimension-Independence: When spectral properties (such as sufficient coverage or a large enough discount factor) are satisfied, convergence of policy-iteration schemes and neural approximators is guaranteed—without a curse of dimensionality—through careful spectral norm or eigenvalue estimates (Feng et al., 24 Mar 2025).

7. Practical Implications and Future Directions

The spectral Bellman representation facilitates:

Efficient and stable feature learning aligned with Bellman dynamics, supporting better policy/value estimation and uncertainty decomposition.
Plug-and-play modifications in standard RL (DQN, policy gradient) pipelines to incorporate spectral regularization, coverage metrics, or structured exploration via covariance-driven perturbations.
Dimension-robust solution approaches for high-dimensional PDEs/Bellman equations, with theoretical justification for deep neural approximators.
The connection between analytical solution theory (PDEs, viscosity solutions) and algorithmic perspective (representation learning, RL exploration), suggesting further unification in future research.

Theoretical developments in spectral Bellman representations are complemented by empirical improvements in sample efficiency, robustness, and generalization in practical control and RL scenarios. The alignment of representation learning with Bellman operator structure continues to be an active area of advancement and application across reinforcement learning, control theory, and numerical PDEs.