Decentralized Stochastic Momentum Prox-Linear

Updated 30 January 2026

The paper demonstrates that D-SMPL integrates exact-penalty reformulation with prox-linearization and STORM momentum to achieve provably optimal oracle complexity.
It employs a two-round consensus gradient tracking protocol, ensuring robust decentralized convergence through effective variance reduction and constraint handling.
Numerical experiments validate that D-SMPL reduces iteration time and improves constraint satisfaction compared to baseline methods.

The Decentralized Stochastic Momentum-based Prox-Linear Algorithm (D-SMPL) addresses the problem of consensus-based decentralized stochastic optimization involving non-convex expected objectives with convex non-smooth regularizers and nonlinear functional inequality constraints. Each agent operates without central coordination, is restricted to querying local stochastic gradient and constraint information, and communicates through neighbor averaging via a doubly stochastic mixing matrix. D-SMPL integrates a prox-linearization of nonlinear constraints, an exact-penalty model for constraint handling, STORM-style momentum for variance reduction, and a two-round consensus-based gradient tracking protocol, achieving provably optimal complexity for this class of decentralized problems (Sharma et al., 28 Jan 2026).

1. Problem Formulation and Exact-Penalty Reformulation

Consider an undirected graph $\mathcal{G}=(\mathcal{V},\mathcal{E})$ of $n$ agents, each with a private stochastic component $f_i(x)=\mathbb{E}_{\xi_i}[f_i(x,\xi_i)]$ , a common convex regularizer $h(x)$ (possibly nonsmooth), and $m$ shared smooth convex nonlinear constraints $g_k(x)\le0$ ( $k=1,\dots,m$ ). The global consensus-optimization task is: $\min_{x\in\mathbb{R}^d} F(x) := \frac1n\sum_{i=1}^n f_i(x) + h(x) \quad \text{subject to} \quad g_k(x)\le 0 \;\forall k$ No central node exists; communication is performed via neighbor-averaging defined by a symmetric, doubly-stochastic $W$ . The problem is recast using an exact-penalty model with a scalar slack variable $\nu\ge0$ and parameter $\gamma>0$ : $\min_{x\in\mathbb{R}^d}\;\Bigl\{f(x)+h(x)+\gamma\max_{k=1,\dots,m}[g_k(x)]_+\Bigr\}$ This is equivalent to: $F_c(x) = \min_{\nu\ge0}\;\Bigl\{f(x)+h(x)+\gamma\nu\Bigr\} \quad\text{s.t. }g_k(x)\le\nu,\;\forall k$ For $\gamma$ sufficiently large and under a strong Slater condition, stationary points of this penalized surrogate correspond to KKT points of the original problem.

2. Algorithmic Workflow

D-SMPL employs local copies of primal iterates ( $x_i^t$ ), momentum estimators ( $z_i^t$ ), and gradient trackers ( $y_i^t$ ) at each agent. Each iteration comprises two communication steps (consensus rounds) separated by a local quadratic program (QP) solve and a stochastic gradient update.

Iteration Steps (per agent $i$ , at step $t$ ):

Prox-linear Subproblem: Solve

$\min_{x,\nu\ge0} \:\langle y_i^t, x \rangle + h(x) + \frac{1}{2\eta}\|x-x_i^t\|^2 + \gamma \nu$

subject to:

$g_k(x_i^t)+\langle\nabla g_k(x_i^t), x-x_i^t\rangle\le\nu \quad \forall k=1,\dots,m$

Consensus (Step 1): Update

$x_i^{t+1} = \sum_{j=1}^n W_{ij}\,\tilde{x}_j^t$

Momentum-based Gradient Update ("STORM" recursion):

$z_i^{t+1} = \nabla f_i(x_i^{t+1},\xi_i^{t+1}) + (1-\beta)\left[z_i^t - \nabla f_i(x_i^t,\xi_i^{t+1})\right]$

Consensus (Step 2, Gradient Tracking):

$y_i^{t+1} = \sum_{j=1}^n W_{ij}y_j^t + (z_i^{t+1} - z_i^t)$

The algorithm outputs a randomly chosen iterate $x_i^t$ from $t=1,\ldots,T$ .

3. Principal Components and Assumptions

3.1 Prox-linear Subproblem Structure

Each per-iteration subproblem is a linearly constrained quadratic program (QP) due to the linearization of the nonlinear $g_k$ about $x_i^t$ . When $h$ is piecewise-linear or quadratic ( $\ell_1$ , elastic net, total variation), the QP remains tractable for standard solvers. Warm-starting and exploiting constraint sparsity facilitate efficient subproblem solutions.

3.2 Stochastic Momentum and Gradient Tracking

The recursion for $z_i^t$ implements a STORM-style estimator, crucial for variance reduction under stochastic gradients. The two-consensus rounds ensure both average agreement among agents (on $x$ and $y$ ) and robust tracking of the network-wide gradient estimate, enabling convergence even in fully decentralized and data-heterogeneous scenarios (Mancino-Ball et al., 2022).

3.3 Key Assumptions

$f_i$ are $L_f$ -smooth in mean-square gradient; $g_k$ are $L_g$ -smooth and convex.
Gradient noise variance for respective agents satisfies $\mathbb E\|\nabla f_i(x,\xi_i) - \nabla f_i(x)\|^2\le\sigma_i^2$ .
The communication matrix $W$ is symmetric, doubly stochastic, and has spectral gap $\lambda\in(0,1)$ ; $\nu=(1-\lambda^2)^{-1}$ .
Initialization need not be feasible; only bounded initial suboptimality and gradient norms are required.

4. Convergence and Complexity Analysis

4.1 Complexity Bounds

With choices

$\eta = \Theta((n^2/(\nu^2\bar{\sigma}^2T))^{1/3}),\quad\beta = \frac{576\nu^2 L^2\eta^2}{n},\quad b_0 = \Theta((nT)^{1/3}),$

and $T=O(\epsilon^{-3/2})$ , D-SMPL guarantees an $\epsilon$ -approximate KKT point for the original problem with total stochastic first-order oracle (SFO) calls per agent: $O\bigl(n(\bar{\sigma}/\epsilon^{3/2})\nu\bigr) = O(\epsilon^{-3/2})$ matching the optimal rate for unconstrained centralized non-convex stochastic optimization. No inner multi-round averaging is necessary; each iteration requires only two consensus communications.

4.2 Core Analytical Ingredients

Consensus and gradient-tracking errors are bounded by the primal step progress $\delta^t=\|\tilde{x}^t-x^t\|^2$ .
Prox-linear descent follows a three-point inequality ensuring decrease of the penalized objective up to controlled error.
Variance in stochastic momentum is managed by balancing $\eta$ and $\beta$ .
Approximate stationarity and near-feasibility are established via small $\delta^t$ and strong Slater-type error bounds.

5. Communication Protocol and Efficiency

Each iteration entails two communication rounds—a first for primal averages ( $x$ ) and a second for gradient-tracker averages ( $y$ ) across immediate neighbors using the fixed mixing matrix $W$ . The method achieves $O(\epsilon^{-3/2})$ communication complexity per agent, matching its SFO complexity. This approach eliminates the need for nested consensus or inner loops and is robust to network structure, as long as connectivity and requisite spectral conditions are met.

6. Practical Implementation and Comparative Performance

6.1 QP Subproblem Solving

When $h$ is $\ell_1$ , total variation, or similar, the subproblem QP entails only linear constraints, permitting high-performance general-purpose solvers (e.g., OSQP). Warm-start strategies and the typical scenario $m\ll d$ minimize solve time. This leads to substantial wall-clock improvements in practice.

6.2 Numerical Experiments

Simulations for energy-optimal ocean trajectory planning (multi-USV navigation under uncertain flow forecasts and formation/speed constraints) demonstrate that D-SMPL and its SCA variant maintain the theoretical $O(\epsilon^{-3/2})$ iteration complexity and require $3$– $5\times$ less wall-clock time per iteration compared to DEEPSTORM (Mancino-Ball et al., 2022) and D-MSSCA baselines, with comparable or superior final energy and constraint satisfaction. This performance boost is attributed to the reduced cost of linearly constrained QP subproblems, as opposed to full convex subproblems.

7. Connections and Extensions

D-SMPL unifies several concepts: exact-penalty reformulation for constraint handling, prox-linearization for tractable subproblems, STORM/momentum for effective variance reduction (Mancino-Ball et al., 2022), and restricted double-consensus gradient tracking for network robustness. Compared to DEEPSTORM (Mancino-Ball et al., 2022), D-SMPL specifically addresses nonlinear constraint handling and utilizes exact-penalty QP subproblems instead of composite proximal steps. This suggests potential for extensions to time-varying or asynchronous networks, though current analysis presumes static, synchronous communication.

D-SMPL provides an efficient and theoretically optimal framework for decentralized non-convex constrained stochastic optimization with rigorous guarantees on oracle and communication complexity (Sharma et al., 28 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (2)

Decentralized Stochastic Constrained Optimization via Prox-Linearization (2026)

Proximal Stochastic Recursive Momentum Methods for Nonconvex Composite Decentralized Optimization (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Decentralized Stochastic Momentum-based Prox-Linear Algorithm (D-SMPL).