Successive Convex Approximation (SCA)

Updated 15 February 2026

SCA is a framework that approximates nonconvex optimization problems by iteratively solving locally accurate convex surrogate problems.
The method constructs surrogates using first-order Taylor expansions with proximal terms, ensuring strong convexity and matching of value and gradient.
SCA adapts to centralized, parallel, distributed, and stochastic settings while providing robust convergence guarantees for complex optimization tasks.

Successive Convex Approximation (SCA) is a general algorithmic framework for solving large-scale, nonconvex, and often constrained optimization problems by iteratively constructing and minimizing a sequence of tractable convex surrogate problems. SCA encompasses a variety of classical and recent algorithms designed for settings in which the direct solution of the original problem is impractical or intractable, including large-scale machine learning, wireless communications, signal processing, portfolio optimization, and reinforcement learning. The central principle is to locally replace the nonconvex objective and constraint functions with convex surrogates that match the original functions' value and gradient at the current iterate, then use the minimizer of this convex subproblem as a new candidate iterate.

1. Framework and Surrogate Design Principles

SCA is defined for generic nonconvex programs of the form

$\min_{x \in \mathcal{X}} \ F_0(x) \quad \text{s.t.} \ F_i(x) \le 0, \ i=1,\ldots,m$

where $\mathcal{X}$ is a compact convex set and each $F_i(x)$ may be smooth and nonconvex, possibly defined as an expectation. The central SCA routine builds at each iteration $k$ convex surrogate functions $\tilde f_i(x; x^k)$ for each $F_i$ , subject to the following conditions:

Touching: $\tilde f_i(x^k; x^k) = F_i(x^k)$ ;
First-order agreement: $\nabla_x \tilde f_i(x^k; x^k) = \nabla F_i(x^k)$ ;
Strong convexity: $x \mapsto \tilde f_i(x; x^k)$ is $\mu_i$ -strongly convex for some $\mu_i > 0$ .

Common surrogates include first-order Taylor plus proximal terms (prox-linear schemes): $\tilde f_i(x; x^k) = F_i(x^k) + \nabla F_i(x^k)^\top (x - x^k) + \frac{L_i}{2} \|x - x^k\|^2$ with $L_i$ a Lipschitz constant. In expectation-constrained or stochastic settings, recursive averaging constructs running surrogates (e.g., empirical means or exponential moving averages) (Ye et al., 2019, Liu et al., 2018).

These surrogates are chosen so that the resulting subproblem is convex and computationally tractable, and so that the surrogate sequence approximates the local behavior of the original problem sufficiently well to ensure descent and asymptotic stationarity.

2. Centralized, Parallel, and Distributed SCA Schemes

SCA is naturally adaptable to centralized, parallel, and distributed computational environments:

Centralized (single-node) SCA: At each iteration, a global convex surrogate is constructed and minimized, updating the full variable vector. Stepsize rules (constant, diminishing, or exact line search) can be incorporated to ensure stability and convergence (Scutari et al., 2018).
Parallel (Block-Structured/Jacobi or Gauss-Seidel) SCA: The variable is decomposed into blocks or coordinates, and block-local surrogates are minimized in parallel (Jacobi-SCA) or sequentially with upstream updates (Gauss-Seidel-SCA) (Scutari et al., 2018). This yields significant speedups on multicore or GPU systems, and preserves convergence under standard strong convexity and Lipschitz conditions.
Distributed and Decentralized SCA: The optimization domain is partitioned across a network of agents (nodes), each of which maintains a local copy of variables and objective terms. SCA is embedded within a consensus or primal-dual ADMM protocol, where local agents minimize their own surrogate-augmented subproblems and communicate with neighbors (Kumar et al., 2019, Lorenzo et al., 2020, Scutari et al., 2018, Tian et al., 2019). The distributed SCA extends to asynchronous, delay-tolerant, and inexact computation regimes. For stochastic/nonconvex consensus problems with constraints and nonsmooth regularizers, decentralized momentum-based SCA schemes attain the best-known SFO complexity: $\mathcal{O}(\epsilon^{-3/2})$ for $\epsilon$ -stationarity (Idrees et al., 2024).

3. Stochastic and Expectation-Constrained SCA

Stochastic SCA methods address problems of the form

$\min_{x \in \mathcal{X}} \ \mathbb{E}_\xi [g_0(x, \xi)] \quad \text{s.t.} \ \mathbb{E}_\xi [g_i(x, \xi)] \le 0, \ i=1,\ldots,m$

with randomness in both objective and constraints. Classic stochastic gradient methods fail to guarantee feasibility because constraints are only satisfied in expectation. SCA resolves this by penalizing the expected constraint violation,

$\min_{x \in \mathcal{X}} \ \mathbb{E}_\xi[g_0(x, \xi)] + \lambda \sum_i [\mathbb{E}_\xi[g_i(x, \xi)]]_+$

and operates on iteratively refined convex surrogates of the stochastic terms, updated via sampled gradients and moving averages. The method ensures almost-sure convergence to a stationary point for sufficiently large $\lambda$ , and incorporates relaxation steps and damped stepsizes (Robbins–Monro conditions) (Ye et al., 2019, Liu et al., 2018, Idrees et al., 2024).

Parallel stochastic SCA approaches further leverage block-wise decomposition of variables/constraints to enable simultaneous, independent optimization, crucial for large-scale multi-agent or parameter-server architectures (Liu et al., 2018, Ye et al., 2019).

4. Theoretical Guarantees and Complexity

Under compactness, smoothness, and surrogate consistency assumptions, SCA exhibits robust global convergence properties:

Every limit point of the iterate sequence is a stationary point of the original nonconvex problem or its penalized equivalent (under constraint qualification).
Descent properties are ensured by the strong convexity and first-order agreement of the surrogates. Global Lyapunov arguments, martingale