Three-Operator Splitting Method

Updated 13 April 2026

Three-operator splitting is an algorithmic framework that decomposes problems into three operators, enabling efficient handling of monotone and smooth components.
It generalizes classic methods like Douglas–Rachford and Forward–Backward schemes via resolvent evaluations and adaptive step-size strategies.
The method offers robust convergence guarantees with empirical performance improvements in convex minimization, PDEs, and distributed optimization.

A three-operator splitting method is an algorithmic framework designed to solve problems where a target operator (typically monotone, or associated with the first-order optimality condition of a minimization problem) naturally decomposes into a sum of three constituent operators, at least one of which has a smooth structure amenable to explicit treatment. This splitting approach generalizes the classic two-operator methods such as Douglas–Rachford and Forward–Backward schemes, and provides increased modeling flexibility and computational efficiency for structured monotone inclusions, composite convex minimization, and partial differential equations.

1. Mathematical Formulation and Fundamental Principles

Three-operator splitting addresses monotone inclusion problems of the form: $\text{Find } x\in \mathcal{H} \text{ such that } 0\in A(x) + B(x) + C(x),$ where $A: \mathcal{H} \rightrightarrows \mathcal{H}$ and $B: \mathcal{H} \rightrightarrows \mathcal{H}$ are maximally monotone operators, and $C: \mathcal{H} \to \mathcal{H}$ is monotone and typically $\beta$ -cocoercive (i.e., $\langle Cx - Cy, x-y \rangle \ge \beta \|Cx - Cy\|^2$ for some $\beta > 0$ ) (Davis et al., 2015).

This framework encompasses convex minimization problems: $\min_{x\in \mathcal{H}} f(x) + g(x) + h(x),$ with $f,g$ proper, closed, convex and $h$ convex with $A: \mathcal{H} \rightrightarrows \mathcal{H}$ 0-Lipschitz gradient (so $A: \mathcal{H} \rightrightarrows \mathcal{H}$ 1 and $A: \mathcal{H} \rightrightarrows \mathcal{H}$ 2).

2. Classical and Algorithmic Schemes

The seminal Davis–Yin three-operator splitting algorithm (Davis et al., 2015) is given as:

Let $A: \mathcal{H} \rightrightarrows \mathcal{H}$ 3 and start with $A: \mathcal{H} \rightrightarrows \mathcal{H}$ 4.
Iterate:
1. $A: \mathcal{H} \rightrightarrows \mathcal{H}$ 5,
2. $A: \mathcal{H} \rightrightarrows \mathcal{H}$ 6,
3. $A: \mathcal{H} \rightrightarrows \mathcal{H}$ 7, $A: \mathcal{H} \rightrightarrows \mathcal{H}$ 8 with $A: \mathcal{H} \rightrightarrows \mathcal{H}$ 9.

The operator $B: \mathcal{H} \rightrightarrows \mathcal{H}$ 0 is $B: \mathcal{H} \rightrightarrows \mathcal{H}$ 1-averaged for $B: \mathcal{H} \rightrightarrows \mathcal{H}$ 2 and fixed points of $B: \mathcal{H} \rightrightarrows \mathcal{H}$ 3 are mapped to zeros of $B: \mathcal{H} \rightrightarrows \mathcal{H}$ 4 via $B: \mathcal{H} \rightrightarrows \mathcal{H}$ 5 (Davis et al., 2015, 1904.11684).

Many variants and extensions exist:

Forward–Douglas–Rachford splitting, where the prox evaluations and gradient steps are permuted (Raguet, 2017).
Inertial and momentum-accelerated schemes for accelerated convergence (1904.11684, Qin et al., 18 Nov 2025, Iyiola et al., 2024).
Adaptive step-size variants (e.g., ATOS) leveraging local smoothness for automatic step selection (Pedregosa et al., 2018).
Bregman-distance generalizations tuning the geometry of the proximal subproblems (Jiang et al., 2022).
Extensions for nonconvex settings using modified envelopes and merit functions (Alcantara et al., 10 Apr 2025, Bian et al., 2020, Yurtsever et al., 2021).

3. Convergence Theory and Complexity Results

Under the basic monotonicity and cocoercivity assumptions, three-operator splitting methods admit robust convergence guarantees:

Weak convergence: For the Krasnosel’skiĭ–Mann iteration with suitable relaxation, the sequence $B: \mathcal{H} \rightrightarrows \mathcal{H}$ 6 converges weakly to a fixed point of $B: \mathcal{H} \rightrightarrows \mathcal{H}$ 7, mapping to a zero of $B: \mathcal{H} \rightrightarrows \mathcal{H}$ 8 (Davis et al., 2015, 1904.11684).
Rates: In the convex case, nonergodic $B: \mathcal{H} \rightrightarrows \mathcal{H}$ 9 rates for the normed residual and $C: \mathcal{H} \to \mathcal{H}$ 0 ergodic rates for function value and variational-inequality error can be established (Davis et al., 2015, Pedregosa, 2016, Wang et al., 2019). With strong monotonicity (e.g., one strongly monotone operator), linear ( $C: \mathcal{H} \to \mathcal{H}$ 1-linear) convergence is obtained (Davis et al., 2015, Qin et al., 18 Nov 2025).
Acceleration: Allowing variable stepsizes or adapting parameters further improves rates (optimal $C: \mathcal{H} \to \mathcal{H}$ 2 for strongly monotone inclusions under varying $C: \mathcal{H} \to \mathcal{H}$ 3) (Davis et al., 2015).
Nonconvex Analysis: Generalizations under suitable energy decrease and Kurdyka–Łojasiewicz conditions yield subsequential global convergence to critical points (Alcantara et al., 10 Apr 2025, Bian et al., 2020).

A robust Lyapunov/IQC-based control interpretation underpins the rigorous certification of rates and parameter choice, supporting the use of LMIs or SDP to derive optimal step and relaxation choices (Wang et al., 2019).

4. Structural Flexibility and Special Cases

Three-operator splitting generalizes and recovers numerous classical methods (Davis et al., 2015, Raguet, 2017):

Forward–Backward: $C: \mathcal{H} \to \mathcal{H}$ 4 yields the standard FB scheme.
Douglas–Rachford: $C: \mathcal{H} \to \mathcal{H}$ 5 yields DRS, with splitting between two maximally monotone operators.
Forward–Douglas–Rachford: Suitable for saddle-point problems with additional constraints.
3-block ADMM: The Davis–Yin scheme reproduces three-block ADMM in dual space, clarifying the role of each sub-problem and enabling block-separable algorithms for multi-term convex programs (Chang et al., 2018, Anshika et al., 2024).

Algorithmic frameworks such as AFBA (Latafat et al., 2016) and recent modifications allow for bounded linear/skew operators and block-preconditioning, further expanding the admissible problem class to composite minimization, primal-dual, and splitting over arbitrary numbers of operators.

5. Practical Enhancements, Adaptivity, and Implementation

Key practical developments include:

Automatic or adaptive step-size strategies (e.g., ATOS, adaptive splitting) which circumvent the need for global smoothness constants and can dramatically speed up convergence compared to fixed-step methods (Pedregosa et al., 2018, Dao et al., 2021).
Momentum and inertial extrapolation (both one-step and two-step) accelerate convergence, obviate strict summability conditions, and provide advantages for large-scale and ill-conditioned problems; two-step schemes have shown improvement over one-step inertia in imaging and regression (1904.11684, Qin et al., 18 Nov 2025, Iyiola et al., 2024).
Bregman and variable-metric splitting use distance-generating functions tailored to problem geometry, facilitating algorithmic preconditioning (Jiang et al., 2022).

Implementation is modular: only the resolvents/ proximal steps for each operator and explicit evaluations (e.g., gradients for $C: \mathcal{H} \to \mathcal{H}$ 6) are needed. In practice, method selection can be guided by the cost profile of the constituent operators; for a highly unbalanced cost, efficiency gains can be realized by adjusting the relative frequency of the sub-steps (Spiteri et al., 2023).

6. Applications and Numerical Performance

Three-operator splitting is applied to diverse domains:

Large-scale convex optimization: Machine learning (group lasso, total variation), signal and image processing (nuclear norm regularization, matrix/tensor completion), PDEs, and control (box-constrained LQR).
Distributed and stochastic optimization: Mini-batch and distributed variants support localized data, enabling distributed machine learning under heavy-tailed or adversarial stochastic oracles (Franci et al., 2022).
Nonconvex optimization: Adapted algorithms and envelope-based analyses guarantee convergence for structured nonconvex objectives common in modern data science (Alcantara et al., 10 Apr 2025, Bian et al., 2020, Yurtsever et al., 2021).
Numerical advantages: Empirical performance shows iterations and wall-clock time can be reduced by factors ranging from 10–20% (PDE splitting) up to order-of-magnitude gains in high-dimensional or degenerate problems (Spiteri et al., 2023, Pedregosa et al., 2018, 1904.11684).

7. Extensions and Future Directions

Ongoing research areas include:

Generalization to more than three operators: Block-coordinate schemes and multi-splitting algorithms—which preserve single-proximal-evaluation structure per operator—underlying large-scale consensus optimization (Raguet, 2017, Latafat et al., 2016).
Adaptive and line-search step-size selection: Parameter-free automatic schemes, suitable for situations with unknown operator characteristics (Pedregosa et al., 2018, Jiang et al., 2022).
Robustness to noise and inexactness: Integration with stochastic and inexact oracles broadens applicability to learning over distributed, federated and noisy environments (Franci et al., 2022).
Further exploration of momentum, extrapolation, and Bregman geometry: To improve rate constants and enable efficient large-scale parallel and distributed implementations (Qin et al., 18 Nov 2025, 1904.11684).
Unified primal-dual and preconditioning perspectives: Leveraging block operator decompositions for composite problems and deep connections to primal–dual and ADMM-type algorithms (Latafat et al., 2016, Anshika et al., 2024).

The versatility, theoretical guarantees, and empirical performance of three-operator splitting have established it as a foundational methodology in contemporary optimization and computational mathematics.