Forward-Backward Representations (FB)

Updated 20 October 2025

Forward-Backward Representations are a mathematical framework that factorizes target quantities into forward (predictive) and backward (reconstructive) components, defining dynamic systems clearly.
They facilitate efficient policy computation, occupancy measure estimation, and rapid task adaptation in reinforcement learning through dual decomposition and latent embedding.
FB methods extend to nonlinear and hierarchical encodings as well as operator splitting strategies, improving performance in zero-shot RL, offline RL, and diverse optimization applications.

Forward-Backward Representations (FB) are a set of mathematical and algorithmic frameworks, most notably in reinforcement learning and scientific computing, that build on the dual decomposition of functional, probabilistic, or optimization quantities into "forward" and "backward" components. They are widely used for constructing efficient representations of policy dynamics, occupancy measures, and structured solution spaces, and enable a range of applications from zero-shot policy computation to advanced optimization and inference. FB representations also encompass operator splitting strategies in optimization and equilibrium problems, as well as factorized formulations in stochastic systems, offering both theoretical generality and practical effectiveness.

1. Theoretical Foundations and Canonical Formulation

FB representations are defined by a factorization of a target quantity—often the discounted state-action occupancy measure or the solution map of a composite operator—into two components:

The "forward" map $F$ , which encodes predictive or prospective information (e.g., future occupancy, reachability, or propagated dynamics).
The "backward" map $B$ , which encodes retrospective or reconstructive information (e.g., how states are reconstructed, or how test attributes map into the solution).

For zero-shot reinforcement learning (RL), this is formalized by:

$M^{\pi_z}(s_0, a_0, s, a) \approx \langle F(s_0, a_0, z),\ B(s, a)\rangle$

where $M^{\pi_z}$ is the occupancy measure under policy $\pi_z$ parameterized by latent reward embedding $z$ , $F$ computes "forward" features, and $B$ reconstructs encountered state-action combinations (Touati et al., 2021, Urpí et al., 7 Jul 2025).

A corresponding latent embedding for a new reward function $r$ is computed as:

$z_r \propto \sum_{s, a} r(s, a)\ B(s, a)$

yielding a Q-value for any $(s, a)$ :

$Q^*_r(s, a) = \langle F(s, a, z_r),\ z_r \rangle$

2. Learning and Exploration: From Reward-Free Data to Efficient Task Adaptation

FB representation learning proceeds in two phases:

Unsupervised Phase: Reward-free exploration generates trajectories that are used to train $F$ and $B$ (typically using TD learning with deep neural parameterization), often with transitions stored in a replay buffer. The loss enforces consistency with Bellman or discounted occupancy equations, e.g., for RL:

$F(s, a, z) \cdot B(s', a') \approx \mathbb{E}[M^{\pi_z}(s, a; ds', da') / \rho(ds', da')]$

where $\rho$ is the state-action marginal.

Test/Task Phase: Upon specification of an arbitrary reward $r$ , its latent representation $z_r$ is computed as above, and an optimal or near-optimal policy is immediately extracted, often by

$\pi_{z_r}(s) = \arg\max_a F(s, a, z_r)^\top z_r$

Recent advances (Urpí et al., 7 Jul 2025) address the crucial issue of exploration coupled with FB learning: rather than relying on decoupled, general-purpose exploration, the exploration policy is defined to minimize the posterior variance (epistemic uncertainty) of $F$ , measured by the ensemble variance of Q-value predictions:

$\mathrm{Var}[Q(s,a)] = \frac{1}{K} \sum_k \langle F_k(s,a,z) - \overline{F}(s,a,z),\ z \rangle^2$

This exploration is performed in the latent reward embedding space $z$ and is shown empirically to dramatically improve sample efficiency in zero-shot RL.

3. Extensions and Richer Representations

Nonlinear and Hierarchical Task Encodings

Standard FB relies on a linear task encoding:

$z = \mathbb{E}_{s}[r(s)\ B(s)]$

However, this linear structure restricts the class of representable reward functions and can impede spatial precision (e.g., for goal-reaching). To overcome this, (Cetin et al., 5 Dec 2024) proposes auto-regressive FB representations in which $z$ is decomposed in blocks:

$\begin{align*} z_1 &= \mathbb{E}_{s}[r(s) B_1(s)]\ z_2 &= \mathbb{E}_{s}[r(s) B_2(s, z_1)]\ &\vdots\ z_K &= \mathbb{E}_{s}[r(s) B_K(s, z_1, ..., z_{K-1})] \end{align*}$

where each $B_i$ can condition on previous levels, thus enabling universal approximation of continuous task encoding maps. This yields substantially improved expressivity and localization for fine-grained or out-of-distribution tasks.

Offline RL and Behavioral Foundation Models

Adapting FB training to offline RL settings necessitates several techniques:

Advantage-weighted policy optimization: Actor updates incorporate advantage-weighting (AWAC-like), $\propto \exp(A(s,a,z)/\beta)$ , to favor high-return actions from offline data.
Ensemble prediction and evaluation-based action selection to mitigate uncertainty and reduce overestimation.
Averaged Bellman targets across forward-network ensemble members, enhancing learning stability in predicting successor measures (Cetin et al., 5 Dec 2024).

Empirically, these improvements allow FB behavioral foundation models (BFMs) to match or outperform single-task offline RL agents (e.g., IQL, XQL) on D4RL and MOOD benchmarks.

4. Mathematical Structures in Optimization, Control, and Inference

FB representations are not unique to RL or sequential prediction. They are intrinsic to several key operator splitting and optimization frameworks:

Forward-Backward Splitting/Envelope: In composite nonsmooth minimization,

$\min_x f(x) + g(x)$

the forward-backward operator and forward-backward envelope (FBE) recast the solution of nonsmooth convex (or nonconvex) problems as the minimization of a differentiable surrogate (Stella et al., 2016, Ozaslan et al., 30 Jul 2024). The FBE is defined by:

$\varphi_\gamma(x) = \min_u \left\{ f(x) + \nabla f(x)^T(u-x) + g(u) + \frac{1}{2\gamma} \|u-x\|^2 \right\}$

with critical points aligned with those of the original objective, and with variable-metric quasi-Newton extensions attaining superlinear convergence under mild conditions.

Nonlinear Operator Splitting (NOFOB): Generalizes FB splitting to nonlinear resolvents and supports complex multi-operator decompositions (e.g., Tseng’s forward-backward-forward, Bregman/Fenchel duality), and yields strong theoretical guarantees under monotonicity and metric subregularity (Giselsson, 2019).
Equilibrium and Saddle Point Problems: FB/FBF dynamics extend to monotone inclusions, saddle-point problems, and bilevel equilibrium systems, even with relaxed conditions—e.g., monotonicity and Lipschitz regularity without cocoercivity (Mittal et al., 18 Mar 2024).
Plug-and-Play (PnP) Algorithms: In imaging, the "backward" step of FB algorithms can be replaced by a (possibly unrolled) iterative denoiser. Both analysis and synthesis denoisers can be embedded, with convergence guaranteed under mild assumptions provided a warm-restart or primal-dual equivalence (Kowalski et al., 20 Nov 2024).

5. Applications

Domain	FB Role	Example Reference
Zero-shot RL / BFM	Policy occupancy factorization, task embedding, fast adaptation	(Touati et al., 2021, Urpí et al., 7 Jul 2025, Cetin et al., 5 Dec 2024)
Autonomous Driving / BEV	Forward-backward projection for BEV feature aggregation	(Li et al., 2023, Li et al., 2023)
Operator Splitting / Opt.	Quasi-Newton FB splitting, NOFOB methods, FBF for minimax	(Stella et al., 2016, Giselsson, 2019, Mittal et al., 18 Mar 2024)
Stochastic DEs (FBSDEs)	Coupling forward SDEs with backward cost-value SDEs	(Issoglio et al., 2016)
High-energy/collider Physics	Asymmetry and correlations (FB, A_FB, etc.)	(Wang et al., 2010, Jung et al., 2010, Han et al., 2011, Mondal et al., 2021)

In zero-shot RL, FB enables quick computation of optimal/near-optimal policies for arbitrary tasks via a single pass of a trained representation.
In visual 3D scene understanding, such as in BEV-based autonomous driving perception, forward-backward transformations improve geometric and context-rich representation of spatial environments, enhancing detection and occupancy inference (Li et al., 2023, Li et al., 2023).
In large-scale optimization and inverse imaging, FB (and its plug-and-play extensions) enables the deployment of powerful denoisers and second-order acceleration within the theoretically principled proximal splitting frameworks.
The FB framework is inherently extensible to nonconvex, stochastic, or multi-agent systems, by exploiting its compatibility with envelope methods and factorized uncertainty quantification.

6. Limitations, Challenges, and Future Directions

While FB frameworks have provided strong theoretical and empirical advances, several limitations and directions for further research remain:

Exploration in RL: Integrating epistemic uncertainty-directed exploration directly into the FB learning pipeline (rather than relying on external exploration mechanisms) is essential for maximal sample efficiency in zero-shot settings (Urpí et al., 7 Jul 2025).
Expressivity in Task Encoding: Overcoming representational linearity is critical for fine-grained task generalization. Auto-regressive and nonlinear features are promising approaches, but scaling and stability issues, particularly with deep or high-dimensional reward spaces, remain active areas (Cetin et al., 5 Dec 2024).
Offline RL and Data Shift: Robust techniques such as ensemble prediction, evaluation-based sampling, and advantage-weighted updates are necessary to prevent catastrophic performance collapse on offline data, especially when task-reward distributions differ substantially between train and test.
Computational Scalability: For large-scale perception (e.g. in FB-OCC), efficient post-processing and joint pre-training strategies are needed to handle domain gaps between 2D/3D features and to mitigate degradation at long ranges or under incomplete data.
Extending FBE and NOFOB: Efforts to generalize the FBE and NOFOB frameworks to broader classes of nonsmooth, nonconvex optimization, and to unified multi-operator and multi-modal domains, remain an active line of theoretical development.

7. Summary Table: Characteristic Structures of Forward-Backward Representations

Component	Definition / Role	Applications
Forward ( $F$ )	Predictive encoding of dynamics, or explicit evaluation in splitting	RL occupancy, optimization
Backward ( $B$ )	Reconstructive or test-feature map, or implicit (proximal/dual) operator	Policy embedding, denoising
Latent $z$	Reward/task embedding, often via projection onto $B$	Zero-shot policies
FB Envelope	Differentiable objective surrogate	Accelerated optimization
FB Splitting	Operator/proximal decomposition	Monotone inclusions

In conclusion, forward-backward representations offer a unifying language and toolbox for dynamic modeling, sample-efficient learning, optimization, and probabilistic inference in high-dimensional and structured systems. Their adaptability and extensibility make them a central theoretical and algorithmic paradigm in contemporary data-driven and scientific computing research.