Forward-Backward Splitting Framework

Updated 5 August 2025

Forward-backward splitting is a method that alternates a forward (gradient) step on a smooth component with a backward (proximal) step on a nonsmooth part, ensuring efficient decomposition.
The framework extends to parallel, variable metric, and Bregman settings, providing robust convergence guarantees in both Hilbert and Banach spaces, even under stochastic errors.
Advanced variants like nonlinear, reflected, and multistep schemes have improved practical performance in applications such as image restoration, machine learning, and distributed optimization.

The forward-backward splitting framework encompasses a class of operator and function splitting algorithms for convex (and more generally, monotone or structured nonconvex) optimization and inclusion problems. Central to modern convex optimization, variational analysis, and inverse problems, this framework exploits the decomposability of objectives and monotone operators, enabling the design of efficient iterative algorithms that alternate between explicit (forward) and implicit (backward) evaluations. The framework covers classical proximal gradient methods and extends to models involving sums of smooth and multiple nonsmooth components, variable metric and Bregman distances, stochastic settings, and generalized convexity, with rigorous convergence analysis in infinite-dimensional Hilbert and Banach spaces.

1. Classical and Generalized Forward-Backward Splitting

The prototypical problem is to minimize $\Psi(x) = F(x) + \sum_{i=1}^n G_i(x)$ , where $F$ is convex and has a Lipschitz continuous gradient, and each $G_i$ is convex and “simple” in the sense that its Moreau proximity operator can be evaluated efficiently. For $n=1$ , the classical forward-backward splitting (FBS) alternates between a forward gradient step on $F$ and a backward (proximal) step on $G_1$ : $x_{t+1} = \operatorname{prox}_{\gamma G_1}(x_t - \gamma \nabla F(x_t))$ where $\gamma$ is the step size.

The generalized forward-backward splitting (Raguet et al., 2011) extends $n>1$ , introducing auxiliary variables $z_{i,t}$ for each $G_i$ , which are updated in parallel: $z_{i,t+1} = z_{i,t} + \lambda_t \left[\operatorname{prox}_{\gamma_t/\omega_i G_i}\left(2x_t - z_{i,t} - \gamma_t \nabla F(x_t)\right) - x_t\right]$ with $x_t = \sum_i \omega_i z_{i,t}$ and $\sum_i \omega_i = 1$ . This fully decouples the nonsmooth terms and enables efficient parallelization.

The framework is equivalently interpreted as finding a zero of a sum of maximally monotone and co-coercive (Lipschitz continuous gradient) operators, recast via resolvents and fixed-point equations: $0 \in \nabla F(x) + \sum_i \partial G_i(x)$ The algorithm’s key fixed-point operator is shown to be firmly nonexpansive, and convergence analysis leverages monotone operator theory, with robustness to summable computational errors in gradient and proximal computations.

2. Extensions: Metrics, Bregman Distances, and Non-Euclidean Settings

Standard forward-backward methods employ the Euclidean metric. Variable metric extensions (Combettes et al., 2012) introduce a sequence of self-adjoint, positive-definite linear operators $U_n$ (variable metrics), leading to updates: $y_n = x_n - \gamma_n U_n B x_n,\qquad x_{n+1} = x_n + \lambda_n[J_{\gamma_n U_n A}(y_n) - x_n ]$ The flexibility of metric selection allows for preconditioning (akin to quasi-Newton), rapid adaptation to local geometry, and can be crucial for ill-conditioned problems.

Bregman forward-backward splitting (Nguyen, 2015, Bùi et al., 2019) replaces the quadratic proximity with a Bregman distance generated by a strongly convex, differentiable kernel $f$ : $x_{n+1} = \operatorname{Prox}^{f_n}_{\gamma_n \psi}\left(\nabla f_n(x_n) - \gamma_n L^* \nabla\psi(Lx_n)\right)$ This generalization enables operation in reflexive Banach spaces, enhances modeling flexibility (e.g., Kullback-Leibler divergence in imaging), and leads to algorithms better tailored to problem structure than their Euclidean counterparts. Convergence relies on variable quasi-Bregman monotonicity, which ensures distance to the set of minimizers decreases up to summable errors.

3. Stochastic and Inexact Forward-Backward Methods

Forward-backward algorithms have been extended to settings with stochastic or inexact operator evaluations (Rosasco et al., 2014). Iterates update as: $z_n = w_n - \gamma_n \xi_n,\quad y_n = J_{\gamma_n A}(z_n),\quad w_{n+1} = (1 - \lambda_n) w_n + \lambda_n y_n$ where $\xi_n$ is a stochastic surrogate for the operator (e.g., a stochastic gradient), with controlled error variance. When stepsizes are appropriately decaying, almost sure convergence is achieved, and optimal $O(1/n)$ rates (in mean-squared error) are matched for strongly monotone inclusions. Stochastic quasi-Fejér sequence arguments underpin these results, and importantly, iterate averaging (which would reduce sparsity) is not required for optimal rates.

4. Advanced Splitting: Nonlinear, Reflected, and Multistep Schemes

Nonlinear and reflected variants further extend the scope of FBS. Nonlinear forward-backward splitting with projection correction (NOFOB) (Giselsson, 2019) introduces flexibility via nonlinear and non-symmetric resolvent kernels $M_k$ , subsuming classical FBS, forward-backward-forward (Tseng's method), and various primal-dual schemes. The iteration becomes: $\hat{x}_k = (M_k + A)^{-1}(M_k - C)(x_k),\quad x_{k+1} = (1 - \theta_k) x_k + \theta_k \Pi_{H_k}^S(x_k)$ where $\Pi_{H_k}^S$ denotes projection onto an affine halfspace determined by the separating hyperplane generated by the current step.

Forward-reflected-backward splitting (Malitsky et al., 2018) handles monotone but non-cocoercive $B$ by introducing a reflection correction: $x_{k+1} = J_A(x_k - 2\lambda B(x_k) + \lambda B(x_{k-1})),\quad 0 < \lambda < 1/(2L)$ offering convergence with the minimal number of forward evaluations per iteration under weaker assumptions than classical FBS.

Additional variants incorporate linesearch, inertia, and multi-operator decomposition, facilitating applications ranging from min-max optimization and learning to distributed variational inequalities.

5. Theoretical Properties: Convergence, Identification, and Rates

Convergence analysis for forward-backward schemes is grounded in monotone operator theory, with key results including weak convergence (to a minimizer or zero of a monotone inclusion) under mild assumptions—convexity, smoothness, and summability of errors. Strong convergence is established when uniform convexity is present.

Local linear convergence rates (Q- and R-linear) have been characterized under partial smoothness of the nonsmooth term and nondegeneracy (Liang et al., 2014). If the regularizer $J$ is partly smooth with respect to a manifold $\mathcal{M}$ , the algorithm identifies the active manifold in finitely many iterations, and convergence proceeds at a local linear rate determined by problem conditioning along $\mathcal{M}$ .

For broader settings, sublinear rates— $O(1/k)$ for general convex problems, and $O(1/k^2)$ under acceleration or special structural conditions—hold. In variable metric and Bregman schemes, rates may depend on the geometry induced by the chosen metric or kernel.

Convergence proofs for inexact and stochastic variants use Fejér monotonicity and quasi-Fejér properties, controlling error accumulation via step-size schedules and martingale arguments.

6. Practical Implementation and Applications

Forward-backward splitting is widely used in:

Image restoration and deblurring, where multiple nonsmooth regularizers (e.g., group sparsity, total variation, $\ell_1$ norms) are imposed in large-scale inverse problems (Raguet et al., 2011).
Support vector machine training, logistic regression, matrix completion, and machine learning tasks, where composite minimization is natural.
Signal processing, optimal control, and data fitting, where problem structure can be exploited via split evaluation of smooth, non-smooth, and constraint terms.
Distributed optimization and decentralized control, leveraging splitting structure to enable parallel and localized computation.

Efficient practical implementation requires attention to:

Stepsize selection (using adaptive methods, linesearch, or spectral formulas);
Efficient computation of proximal and projection steps (exploiting structures such as separability or sparsity);
Memory and communication considerations in distributed settings.

State-of-the-art software such as FASTA implements advanced FBS schemes with adaptive stepsize, acceleration, backtracking, and flexible problem modeling (Goldstein et al., 2014).

7. Unification, Extensions, and Future Directions

The theoretical developments in generalized and variable metric FBS (Xue, 2021, Combettes et al., 2012), nonlinear kernelization (Giselsson, 2019), and inclusion of history and deviation terms (Sadeghi et al., 2021, Sadeghi et al., 2022) demonstrate the unification of a wide variety of splitting and optimization algorithms under a common operator-theoretic perspective. These frameworks extend to Banach spaces, leverage generalized convexity ( $\Phi$ -convexity) to prove convergence in nonstandard settings (Oikonomidis et al., 23 Mar 2025), and systematically unify primal-dual, ADMM, and multi-operator splitting strategies.

Current research directions seek:

Quantitative complexity and rate bounds in the presence of multiple nonsmooth terms and variable metric/Bregman geometries;
Multistep and accelerated variants with better convergence profiles;
Robustness and adaptivity to operator properties (e.g., Lipschitz constants, strong convexity, or pseudo-monotonicity);
Deeper connections with learning, high-dimensional statistics, and decentralized optimization paradigms.

The forward-backward splitting framework, in its many variants, continues to be a fundamental analytical and algorithmic template in modern optimization, imaging, computational mathematics, and machine learning.