Spectral Bellman Method (SBM): A Unified Framework

Updated 18 March 2026

Spectral Bellman Method (SBM) is a unifying framework that applies spectral and functional-analytic techniques to policy evaluation and control in RL and optimal control.
SBM accelerates value estimation by applying low-degree polynomial corrections (e.g., Chebyshev, Krylov) to the Bellman residual, achieving super-linear convergence.
SBM extends to representation learning and distributional RL by aligning value function approximations with spectral properties and utilizing Hilbert space embeddings.

The Spectral Bellman Method (SBM) is a unifying conceptual and algorithmic framework that exploits spectral and functional-analytic perspectives on Bellman operators for policy evaluation and control in reinforcement learning (RL), optimal control, and distributional RL. SBM encompasses a spectrum of theoretical, algorithmic, and representational advances: accelerated value estimation via spectral/Krylov polynomial acceleration, spectral reconstruction of value functions via inverse-scattering, function approximation schemes in spectral Barron spaces, Bellman-aligned feature learning objectives, and Hilbert space embeddings for distributional Bellman dynamics. Modern SBM formulations systematically exploit the spectral structure of Bellman operators, yielding improved contraction, convergence, and representation properties relative to traditional approaches.

1. Spectral Acceleration of Bellman Operators

A central insight of SBM is the explicit use of the spectral decomposition of linearized Bellman operators (e.g., $Q = I - \gamma P$ for transition kernel $P$ ) to accelerate policy evaluation. In classical policy evaluation, the value function $V^*$ is the solution to $(I-\gamma P)V^* = r$ . Fixed-point iteration converges at geometric rate $\gamma$ , independent of the finer structure of $P$ 's spectrum. SBM instead considers the evolution of error $e_k = V_k - V^*$ and explicitly applies low-degree polynomials $p_m(Q)$ to the error, yielding accelerated convergence governed by the spectral shape of $Q$ . In exact arithmetic, the optimal such polynomial (e.g., Chebyshev or conjugate-gradient related) achieves super-linear convergence rates depending on the restricted spectrum of $Q$ on successively smaller subspaces, as shown by the Krylov-Bellman Boosting (KBB) algorithm (Xia et al., 2022).

KBB operates by generating successive Krylov subspaces

$K_m(Q, r_0) = \mathrm{span}\{r_0, Q r_0, Q^2 r_0, \dots, Q^{m-1} r_0\}$

where $r_0$ is the initial Bellman residual, and restricts each new value function estimate to $V_m \in V_0 + K_m(Q, r_0)$ . This optimal polynomial correction framework yields super-linear contraction, with the error after $m$ iterations bounded as

$\| V^* - V_m \|_Q \leq C \prod_{t=1}^m \left( 1 - \lambda_t^2 / (8\Lambda_t) \right) \| V^* - V_0 \|_Q$

where $\lambda_t$ , $\Lambda_t$ are the minimal and maximal eigenvalues of $Q$ restricted to successive orthogonal complements. Typically, this yields a "double-logarithmic" sample complexity and large reductions over fitted value iteration, without increasing per-iteration complexity (Xia et al., 2022).

2. Spectral Methods for Optimal Control: Inverse Scattering and Spectral Barron Spaces

SBM has been extended to the solution of Hamilton-Jacobi-Bellman (HJB) equations arising in optimal control. In the context of linear-quadratic Markov decision processes (LMDPs), the Bellman (or HJB) PDE can be recast as a spectral (Schrödinger-type) problem via suitable transformations (e.g., Cole–Hopf), reducing the nonlinear PDE to a linear backward Kolmogorov or Schrödinger equation with a potential function capturing the cost-to-go. The system's optimal control law is then extracted from the solution to this spectral problem (Schneider et al., 2022). The potential is reconstructed from boundary (asymptotic) data via the Gel'fand–Levitan–Marchenko (GLM) integral equation, a machinery borrowed from quantum inverse scattering. This spectral route obviates path-sum or grid-based dynamic programming and admits highly parallelizable computation, with main algorithmic cost dominated by the GLM solve and FFTs. However, the method is specific to LMDPs with linear drift and quadratic costs, and may require nontrivial extensions for nonquadratic or multidimensional cases.

An alternative spectral approach targets the functional spaces wherein HJB solutions lie. Under the assumption that the coefficients of the HJB PDE are spectral Barron functions—a class of functions with controlled Fourier decay—one can show the existence of classical solutions within Banach spaces of spectral Barron type. Iterative policy improvement (Howard's method) can be performed entirely within these spaces. Each policy evaluation step in the iteration solves a linear elliptic PDE in the spectral Barron class, and the feedback map from value to policy preserves the Barron structure. Crucially, the solution admits quantitative approximation by shallow (two-layer) neural networks, at rates independent of the ambient space dimension, thus avoiding the curse of dimensionality (Feng et al., 24 Mar 2025).

3. Spectral Bellman Representations for Representation Learning

In value-based RL, the Spectral Bellman Method introduces a principle for learning feature representations that are directly aligned to the spectral structure of Bellman dynamics via the Inherent Bellman Error (IBE) condition. The IBE measures the closure of a linear value-function class $\mathcal{Q}_\phi$ under the Bellman operator:

$\mathcal{I}_\phi = \sup_{Q \in \mathcal{Q}_\phi} \inf_{ \widetilde{Q} \in \mathcal{Q}_\phi}\|T Q - \widetilde{Q}\|_\infty$

Zero IBE implies that the Bellman image of any $Q$ in the class remains in the class (possibly after projection), reducing to the linear MDP regime in special cases. Under $\mathcal{I}_\phi = 0$ , the Bellman transformation of value functions admits a spectral (singular value decomposition) structure, with nonzero singular values directly matching the feature covariance matrices' eigenvalues (Nabati et al., 17 Jul 2025).

SBM for feature learning derives a joint objective that enforces the coupled "power-iteration" structure of Bellman dynamics on the space of feature and parameter covariances. The representation update alternates steps minimizing

$L_\text{SBM}(\phi, \widetilde{\theta}; \rho, \nu) = L_1(\phi) + L_2(\widetilde{\theta}) + L_\text{orth}$

where $L_1, L_2$ impose Bellman-aligned covariance constraints and $L_\text{orth}$ enforces orthogonality of features and parameters. Integrating this update with standard value-based methods (e.g., DQN, R2D2), one obtains improved sample efficiency, especially in challenging exploration regimes, and enables structured exploration by targeting the most uncertain value directions. SBM supports a multi-step extension, with theoretical contraction of the $h$ -step IBE as a function of $h$ (Nabati et al., 17 Jul 2025).

4. Distributional SBM: Hilbert-Space Embeddings in the Cramér Geometry

In distributional RL, SBM applies to the Bellman operator acting not on scalar value functions but on return distributions (or their CDFs), using the Cramér metric as the natural geometric structure. The distributional Bellman operator $\mathcal{T}_D^\pi$ acts affinely on CDFs and linearly on differences of CDFs. This motivates representing distributional dynamics in an $L^2$ Hilbert space of centered CDFs and, more generally, in regularized spectral Hilbert spaces $\mathcal{G}_\varepsilon$ defined by Fourier-norms with frequency-dependent weights:

$\|f\|_{\mathcal{G}_\varepsilon}^2 = \frac{1}{2\pi}\int_\mathbb{R} |\hat f(\omega)|^2 / (\omega^2 + \varepsilon) d\omega$

The mapping from centered CDFs to $\mathcal{G}_\varepsilon$ is carried out by a conjugation operator $U_\varepsilon$ , and the action of the Bellman operator is diagonalized in a basis solving a Fredholm integral eigenproblem with respect to $\mathcal{G}_\varepsilon$ 's geometry. Truncated projections onto the leading spectral modes yield practical approximations with rigorously controllable geometric error and linear convergence rates. In the zero-regularization ( $\varepsilon \downarrow 0$ ) limit, the geometry coincides with the native Cramér metric, providing an operator-theoretically faithful spectral embedding for the distributional Bellman update (Wang et al., 13 Mar 2026).

5. Computational Complexity, Sample Efficiency, and Empirical Results

SBM-based algorithms achieve improved computational and sample complexity via several mechanisms:

Krylov acceleration: For policy evaluation, SBM's polynomial acceleration in Krylov subspaces, as instantiated by KBB, achieves super-linear convergence: $\mathcal{O}(\log\log(1/\epsilon))$ iterations to reach error $\epsilon$ , compared to $\mathcal{O}(\log(1/\epsilon))$ for geometric convergence (Xia et al., 2022).
Sample complexity: SBM allows the sample size per iteration $N_t$ to be chosen based on localized complexity (of the regression class, etc.), yielding total sample complexity $N_\text{total} = \mathcal{O}(m \cdot \epsilon^{-2/\alpha})$ , dramatically smaller than classical fitted value iteration for typical nonparametric classes.
Control via spectral inverse scattering: Dominated by the solution of GLM equations, FFTs, and sparse matrix operations; highly parallelizable (Schneider et al., 2022).
Representation learning: Empirical results on Atari benchmarks show that SBM-enhanced DQN and R2D2 achieve significant improvements in Human-Normalized Score, most pronounced on hard-exploration games. Thompson-sampling exploration based on SBM feature covariances further amplifies gains (Nabati et al., 17 Jul 2025).
Distributional SBM: Iterative coefficient updates involve at most $\mathcal{O}(n^2)$ per state-action for $n$ spectral modes, with accuracy controlled by truncation and regularization parameters (Wang et al., 13 Mar 2026).

6. Limitations, Scope, and Extensions

SBM frameworks are subject to various structural, model, and implementation assumptions:

Polynomial acceleration and super-linear convergence in KBB depend on favorable spectral gaps and reversibility or positivity of $Q$ (Xia et al., 2022).
The inverse scattering formulation presumes LMDP structure; i.e., linear drift, additive Gaussian noise, and quadratic control costs. Cole–Hopf linearization is invalid for nonquadratic or state-dependent penalties. Higher-dimensional scattering problems quickly become intractable (Schneider et al., 2022).
The spectral Barron space approach applies when HJB coefficients are spectral Barron functions, and guarantees rely on sufficiently large discount factor $\gamma$ (Feng et al., 24 Mar 2025).
Multi-step and distributional SBM variants require precise matching of the operator's spectral geometry and underlying measure concentration. Truncation and regularization introduce approximation errors that must be monitored.
Empirically, parallelizability and estimation/sampling issues become prominent in high-dimensional or large-scale MDPs.

7. Theoretical Significance and Synthesis

SBM unites several strands of theory: operator spectra in policy evaluation and distributional RL, analytic transforms reducing nonlinear HJB-type equations to spectral linear problems, functional spaces supporting high-dimensional approximation, and the explicit alignment of learned representations with functional or operator-analytic structure underlying Bellman updates. This synthesis enables both rigorous control of convergence and approximation quality (often independent of dimension, in the case of spectral Barron spaces) and practical construction of algorithms with improved empirical and computational properties. The spectral viewpoint thus deepens understanding of value function propagation, optimality structures, and the design of sample- and computation-efficient RL and control algorithms (Xia et al., 2022, Schneider et al., 2022, Feng et al., 24 Mar 2025, Nabati et al., 17 Jul 2025, Wang et al., 13 Mar 2026).