Bellman-Closed Feature Transport in RL

Updated 4 February 2026

Bellman-Closed Feature Transport is a framework ensuring that Bellman updates preserve the representational span of state-action features.
It employs a spectral learning objective that leverages singular value decomposition of feature covariances to enforce closure under Bellman dynamics.
The method enhances exploration and long-horizon credit assignment, demonstrating significant gains in performance on Atari benchmarks.

Bellman-Closed Feature Transport is a principled framework in reinforcement learning (RL) for learning state-action representations such that the span of value functions is preserved—i.e., “closed”—under Bellman backups. Developed as a core component of the Spectral Bellman Method (SBM), this perspective unifies representation learning and structured exploration by leveraging the inherent spectral properties of the Bellman operator when acting on linearly parameterized function classes. The central mechanism involves enforcing or approximating “Bellman-closure” through a spectral learning objective, ensuring that value estimates remain within the learned feature space as dictated by the Bellman update dynamics (Nabati et al., 17 Jul 2025).

1. Formal Foundations: Bellman Closure and Inherent Bellman Error

Let $\phi: S \times A \to \mathbb{R}^d$ be a feature map, and consider the linear value function class $\mathcal{Q}_\phi = \{ Q_\theta(s, a) = \phi(s, a)^\top\theta\ |\ \theta \in \mathcal{B}_\phi \}$ . The Inherent Bellman Error (IBE) quantifies the minimal residual when projecting the outcome of a Bellman update back onto $\mathcal{Q}_\phi$ : $\text{IBE}_\phi := \sup_{Q \in \mathcal{Q}_\phi} \inf_{\tilde{Q} \in \mathcal{Q}_\phi} \| \mathcal{T} Q - \tilde{Q}\|_\infty = \sup_{\theta \in \mathcal{B}_\phi} \inf_{\tilde{\theta} \in \mathcal{B}_\phi} \| \mathcal{T} Q_\theta - Q_{\tilde{\theta}} \|_\infty.$ Zero-IBE (“Bellman-closed” assumption) holds if $\text{IBE}_\phi = 0$ , i.e., the Bellman operator $\mathcal{T}$ maps $\mathcal{Q}_\phi$ exactly into itself. For any policy $\pi$ , the $k$ -step Bellman extension is recursively defined: $\mathcal{T}^k Q = \mathcal{T}(\mathcal{T}^{k-1} Q)$ .

Bellman-closed feature transport refers to constructing or learning a feature space $\phi$ for which Bellman updates of any $Q_\theta$ remain in the span $\{\phi_i\}_{i=1}^d$ , ensuring the representational stability of value iteration and policy evaluation.

2. Spectral Decomposition and Feature Transport under Zero-IBE

If the Bellman-closed condition is met, a powerful spectral structure emerges. For finite $S \times A$ and parameter set $\{\theta_j\}_{j=1}^m$ , let $\Phi \in \mathbb{R}^{n \times d}$ and $\Theta \in \mathbb{R}^{d\times m}$ aggregate features and parameters. Under zero-IBE, for any set of weights $\rho(s,a)$ and $\nu(\theta)$ , the Bellman transport is

$\overline{Q} = P_{s,a} (\Phi \Theta) P_\theta = \Phi_P \tilde{\Theta}_P,$

where $P_{s,a}$ and $P_\theta$ are diagonal matrices of the sampling distributions. The feature covariance $\Lambda = \mathbb{E}_\rho [ \phi \phi^\top ]$ dictates the singular values of $\overline{Q}$ : $\text{SVD:}\ \overline{Q} = U \Sigma V^\top,\qquad \Sigma = \text{diag}(\sqrt{\lambda_1}, \ldots, \sqrt{\lambda_d}, 0, \ldots).$ The nonzero singular values are square roots of the covariance eigenvalues $\lambda_i$ . The Bellman operator acts linearly in feature space: $\mathcal{T} \phi = \phi A$ for some $A \in \mathbb{R}^{d \times d}$ , so that for all $\theta$ , $\mathcal{T} Q_\theta(s, a) = \phi(s, a)^\top \tilde{\theta}(\theta)$ . The functional consequence is that Bellman transport never leaves $\operatorname{span} \{\phi_i\}$ —achieving true feature transport under Bellman closure (Nabati et al., 17 Jul 2025).

3. Spectral Bellman Representation Objective

To operationalize Bellman-closed transport, SBM introduces a spectral loss function, distinct from the standard Bellman mean squared error: $\mathcal{L}_{\text{SBM}}(\phi, \tilde{\theta}; \rho, \nu) = \mathcal{L}_1(\phi) + \mathcal{L}_2(\tilde{\theta}) + \mathcal{L}_{\text{orth}}(\phi, \tilde{\theta}),$ with

$\mathcal{L}_1(\phi) = \mathbb{E}_{\rho, \nu} [ \phi(s, a)^\top \Lambda_2 \phi(s, a) - 2Q_{\theta}(s, a) \phi(s, a)^\top \tilde{\theta}(\theta) ],$

$\mathcal{L}_2(\tilde{\theta}) = \mathbb{E}_{\rho, \nu} [ \tilde{\theta}(\theta)^\top \Lambda_1 \tilde{\theta}(\theta) - 2Q_{\theta}(s, a) \tilde{\theta}(\theta)^\top \phi(s, a) ],$

where $\Lambda_{1}$ and $\Lambda_{2}$ are batch-averaged feature and parameter covariances. The orthogonality regularizer $\mathcal{L}_{\text{orth}}$ enforces mutual decorrelation of features and transported parameters. This power-iteration-inspired loss embodies the alternating update structure of the singular value decomposition for the Bellman transport operator. At optimality, these losses guarantee that $\mathcal{T} Q_\theta$ remains in the representational span $\{\phi_i\}$ for all $\theta$ , thus enforcing Bellman-closed transport.

4. Algorithmic Integration and Computational Considerations

Integrating SBM with value-based RL methods requires minimal changes:

After each Q-learning update of $\theta$ :

$\theta_{t+1} = \arg\min_\theta \mathbb{E}_{(s, a) \sim \mathcal{D}} [(r + \gamma \max_{a'} \phi_t(s', a')^\top \theta^- - \phi_t(s, a)^\top \theta)^2].$
Update the feature map by minimizing the spectral Bellman loss

$\phi_{t+1} \leftarrow \arg\min_{\phi} \mathcal{L}_{\text{SBM}}(\phi, \tilde{\theta}; \rho_t, \nu_t)$

with $\nu_t = \mathcal{N}(\theta_{t+1}, \sigma_{\text{rep}}^2 I)$ and $\rho_t$ from uniform or buffer sampling.

Structured exploration is provided by Thompson Sampling in parameter space: $\theta_{\text{TS}} \sim \mathcal{N}(\theta_t, \sigma_{\text{exp}} \Sigma_t^{-1}), \quad \Sigma_t = \lambda I + \sum \phi_t \phi_t^\top.$ Computationally, covariance estimation and regularization are $O(bd^2)$ per mini-batch, with no need for large-scale SVD. A full $d \times d$ eigendecomposition ( $O(d^3)$ ) is only needed for exploration variance; this cost can be amortized or approximated.

5. Exploration, Long-Horizon Credit Assignment, and Empirical Impact

Bellman-closed feature transport underpins structured exploration by encoding uncertainty directly in the feature covariance structure. The quantity $\max_\pi \|\phi_\pi\|_{\Sigma_t^{-1}}$ , where $\phi_\pi = \mathbb{E}_{d^\pi}[\phi(s, a)]$ , is the exploration-critical uncertainty term tractably minimized via Thompson Sampling. This mechanism is especially effective in long-horizon, hard-exploration scenarios (Nabati et al., 17 Jul 2025).

In empirical studies on Atari-57 and the “Atari Explore” suite (e.g., Montezuma’s Revenge, Pitfall!, Skiing), SBM-enhanced agents demonstrated substantial gains:

Agent	Base (Atari-57)	SBM+TS (Atari-57)	Base (Atari Explore)	SBM+TS (Atari Explore)
DQN	1.61	1.91	0.22	0.42
R2D2	3.2	3.51	0.42	0.67

These improvements are most pronounced for long-horizon, sparse-reward tasks, indicating effective feature transport and credit assignment under Bellman dynamics.

6. Extensions, Generalizations, and Limitations

SBM naturally extends to multi-step Bellman operators. For the $h$ -step Bellman operator: $\text{IBE}_\phi^h := \sup_\theta \inf_{\tilde{\theta}} \|\mathcal{T}^h Q_\theta - Q_{\tilde{\theta}}\|_\infty,$ with the bound $\text{IBE}_\phi^h \leq \sum_{i=0}^{h-1} \gamma^i \text{IBE}_\phi \leq (1-\gamma)^{-1} \text{IBE}_\phi$ . Thus, zero-IBE for $\mathcal{T}$ implies closure for all $\mathcal{T}^h$ , supporting use in deep RL architectures like R2D2 and retrace-based off-policy targets.

Limitations include sensitivity to the parameter sampling variance $\sigma_{\text{rep}}$ , incomplete theory for approximate (non-zero) IBE, and empirical validation currently focused on Atari benchmarks. Generalization to continuous control, richer $\tilde{\theta}(\theta)$ parametrizations, and convergence analysis under stochastic updates remain active research directions.

Bellman-Closed Feature Transport, as instantiated by the Spectral Bellman Method, provides a spectral framework ensuring representational alignment with Bellman dynamics. This approach yields theoretical guarantees of closure, empirical improvements in exploration and credit assignment, and flexible integration with value-based RL algorithms, advancing unified perspectives on representation and exploration in RL (Nabati et al., 17 Jul 2025).

Markdown Report Issue Upgrade to Chat

References (1)

Spectral Bellman Method: Unifying Representation and Exploration in RL (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Bellman-Closed Feature Transport.

Bellman-Closed Feature Transport in RL

1. Formal Foundations: Bellman Closure and Inherent Bellman Error

2. Spectral Decomposition and Feature Transport under Zero-IBE

3. Spectral Bellman Representation Objective

4. Algorithmic Integration and Computational Considerations

5. Exploration, Long-Horizon Credit Assignment, and Empirical Impact

6. Extensions, Generalizations, and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Bellman-Closed Feature Transport in RL

1. Formal Foundations: Bellman Closure and Inherent Bellman Error

2. Spectral Decomposition and Feature Transport under Zero-IBE

3. Spectral Bellman Representation Objective

4. Algorithmic Integration and Computational Considerations

5. Exploration, Long-Horizon Credit Assignment, and Empirical Impact

6. Extensions, Generalizations, and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research