Decentralized SCA Momentum-based Prox-Linear (D-SCAMPL)

Updated 30 January 2026

The paper introduces D-SCAMPL, which reformulates constrained nonconvex problems using an exact-penalty approach to avoid costly projections.
It employs successive convex approximation and local prox-linear surrogates with momentum-based variance reduction for efficient decentralized optimization.
The method achieves optimal sample and communication complexity while ensuring feasibility and consensus across agents in multi-agent networks.

Decentralized SCA Momentum-based Prox-Linear (D-SCAMPL) is a distributed algorithmic framework for consensus-based stochastic optimization in multi-agent networks, targeting nonconvex objectives with convex nonsmooth regularization and complex functional constraints. D-SCAMPL leverages successive convex approximation (SCA), momentum-based variance-reduction, and local prox-linearization to achieve efficient decentralized constrained optimization via only local stochastic gradients and constraint information, without requiring global projections or multiple consensus rounds per iteration (Sharma et al., 28 Jan 2026).

1. Problem Setting and Motivation

D-SCAMPL addresses decentralized stochastic optimization over undirected networks, where $n$ agents collectively solve

$\min_{x\in\mathbb{R}^d} F_0(x) := f(x) + h(x) \quad \text{subject to} \quad g_k(x) \leq 0, \;\; k=1,\ldots,m,$

with $f(x) = \frac{1}{n}\sum_{i=1}^n \mathbb{E}_{\xi_i}[f_i(x,\xi_i)]$ denoting a possibly nonconvex smooth objective, $h(x)$ a convex nonsmooth regularizer, and $g_k(x)$ smooth convex (possibly nonlinear) inequality constraints. Each agent $i$ has access to only local data and first-order oracle calls for $f_i$ and $g_k$ . Explicit projection onto the feasible set $\{x: g_k(x)\leq 0\}$ is assumed intractable; thus, the algorithm must avoid subproblems requiring such projections.

The central challenge is to design an algorithm that (i) handles stochastic gradients and variance, (ii) ensures feasibility with nonlinear constraints, (iii) is communication-efficient (few rounds per iteration), and (iv) achieves the optimal sample/communication complexity scaling for nonconvex decentralized problems.

2. Algorithmic Framework

D-SCAMPL follows an SCA-proximal linearization approach. The key mechanisms are:

Exact-Penalty Reformulation: The constrained problem is equivalently rewritten via the penalty function, introducing a penalty parameter $\gamma>0$ and a penalty slack $v\geq0$ such that

$\min_{x\in\mathbb{R}^d, \, v\geq 0} f(x) + h(x) + \gamma v \qquad \text{subject to } g_k(x) \leq v\; \forall k.$

This formulation avoids complex projections and enables subgradient control via max-penalty.

SCA-Prox-Linear Surrogate: At each iteration $t$ and agent $i$ , D-SCAMPL constructs a strongly convex surrogate $F^t(x)$ based on linearization of $f_i$ and $g_k$ around the current iterate. The resulting subproblem is a quadratic program with linearized constraints and proximal regularization, solvable efficiently.
Momentum-Based Variance Reduction: The local gradient estimator $z_i^{t+1}$ is recursively updated as

$z_i^{t+1} = \nabla f_i(x_i^{t+1}, \xi_i^{t+1}) + (1-\beta)\left[ z_i^t - \nabla f_i(x_i^t, \xi_i^{t+1}) \right],$

enabling variance-reduction and momentum effects that accelerate convergence.

Distributed Consensus: Each iteration involves two rounds of weighted averaging (mixing step via a symmetric, doubly-stochastic matrix $W$ ) for both the primal variable and gradient-tracking estimator, ensuring network-wide approximate consensus without centralized coordination.

3. Detailed Per-Iteration Mechanisms

At each iteration $t$ and for each agent $i$ :

SCA Prox-Linear Subproblem:
- Formulate and solve
$(d_i^\star, v_i^\star) = \arg\min_{d_i, v_i \geq 0} \langle y_i^t, d_i \rangle + \frac{1}{2\eta}\|d_i\|^2 + \gamma v_i + h(x_i^t + d_i)$

subject to

$\nabla g_k(x_i^t)^\top d_i + g_k(x_i^t) \leq v_i, \quad k=1,\dots,m.$

- Update $x_i^t \leftarrow x_i^t + d_i^\star$ .

Consensus Averaging (Primal): $x_i^{t+1} = \sum_{j=1}^n W_{ij} x_j^t$
Variance-Reduced Gradient Update: $z_i^{t+1}$ as above.
Consensus Averaging (Gradient-Tracking): $y_i^{t+1} = \sum_{j=1}^n W_{ij} y_j^t + (z_j^{t+1} - z_j^t)$

Communication per iteration is limited to two rounds of weighted averaging—one for the variable and one for the dual/gradient estimator (Sharma et al., 28 Jan 2026).

4. Convergence Properties and Complexity

D-SCAMPL achieves an oracle (stochastic first-order oracle) and communication complexity of $O(\epsilon^{-3/2})$ , matching the optimal rate for unconstrained nonconvex centralized stochastic problems under standard smoothness and regularity assumptions. The main convergence theorem ensures that, after $T=O(\nu\sigma/(n\epsilon^{3/2}))$ iterations (where $\nu$ depends on spectral gap $(1-\lambda^2)^{-1}$ of $W$ and $\sigma^2$ is gradient variance), the output is an $\epsilon$ -KKT approximate solution:

Consensus error $\mathbb{E}\|x_i^t - \bar{x}^t\|^2 \leq \epsilon$ ,
Proximity $\mathbb{E}\|\hat{x}_i^t - x_i^t\|^2 \leq \epsilon$ (for inexact subproblems),
Stationarity and feasibility (high probability): $\mathbb{E}\left\| \nabla f_i(\hat{x}_i^t) + w_i + \sum_k \lambda_{ik} \nabla g_k(\hat{x}_i^t) \right\|^2 \leq \epsilon$ , $\Pr [\hat{x}_i^t \notin \mathrm{Feas}] \leq \epsilon$ .

No agent needs projections onto nonlinear constraints; only local quadratic programs with linearized constraints are solved per step, and communication is minimized (Sharma et al., 28 Jan 2026).

D-SCAMPL extends and outperforms prior decentralized stochastic composite optimization techniques:

Algorithm	Constraints	Consensus	Regularizer	Sample Complexity	Communication
D-SCAMPL	Nonlinear	Two rounds	Nonsmooth (h(x))	$O(\epsilon^{-3/2})$	$O(\epsilon^{-3/2})$
DEEPSTORM (Mancino-Ball et al., 2022)	None	Two rounds	Nonsmooth (r(x))	$O( 1/(N \epsilon^{3/2}) )$	$O(\epsilon^{-3/2}/\sqrt{1-\rho})$
D-PSGD [14]	None	One round	Smooth only	$O({N}/{\epsilon^2})$	$O(1/\epsilon^2)$
D-MSSCA	Feasibility	Multi-round	Smooth, SCA	$O(\epsilon^{-3/2})$	Higher

D-SCAMPL is the only approach to date that efficiently handles nonlinear constraints and nonsmooth regularization in the decentralized nonconvex stochastic regime without requiring expensive global projections or numerous consensus steps per iteration. In contrast, methods such as DEEPSTORM (Mancino-Ball et al., 2022) target composite but unconstrained problems, while D-MSSCA and similar SCA-based approaches require more communication or complex constraint-handling.

6. Empirical Results and Practical Considerations

Numerical experiments conducted on energy-optimal ocean trajectory planning—involving four unmanned surface vehicles (USVs) coordinated by $n=3$ networked users and governed by complex stochastic ocean-current models—demonstrate robust performance:

D-SCAMPL (and its base variant D-SMPL) attain KKT-residual and constraint violation decay rates matching the theoretical $O(\epsilon^{-3/2})$ complexity.
Iterations impose only small quadratic subproblems (from linearized constraints), improving per-iteration runtime.
Only two consensus rounds per iteration are necessary, resulting in 2–5× lower wall-clock times to within a given feasibility/objective threshold compared to existing constrained decentralized baselines.
No feasibility projections are required during optimization (Sharma et al., 28 Jan 2026).

A plausible implication is that D-SCAMPL provides practical scalability for distributed learning and control in networked systems subject to realistic nonlinear constraints.

7. Summary and Outlook

D-SCAMPL constitutes a significant advance in decentralized constrained nonconvex stochastic optimization. Through exact-penalty reformulation, momentum-variance-reduction, and local prox-linear surrogate minimization—combined with SCA methodology—it achieves optimal complexity rates, low communication load, and broad applicability to real-world decentralized learning under general constraint structures. By operating entirely via local first-order information and gradient tracking, it circumvents the primary communication and projection bottlenecks associated with prior art (Sharma et al., 28 Jan 2026).

The bifurcation between D-SCAMPL and methods such as DEEPSTORM (Mancino-Ball et al., 2022) indicates that future research may further unify advances in decentralized composite, constrained, and communication-efficient stochastic optimization.

Markdown Report Issue Upgrade to Chat

References (2)

Decentralized Stochastic Constrained Optimization via Prox-Linearization (2026)

Proximal Stochastic Recursive Momentum Methods for Nonconvex Composite Decentralized Optimization (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Decentralized SCA Momentum-based Prox-Linear (D-SCAMPL).