Papers
Topics
Authors
Recent
Search
2000 character limit reached

Decentralized Stochastic Momentum Prox-Linear

Updated 30 January 2026
  • The paper demonstrates that D-SMPL integrates exact-penalty reformulation with prox-linearization and STORM momentum to achieve provably optimal oracle complexity.
  • It employs a two-round consensus gradient tracking protocol, ensuring robust decentralized convergence through effective variance reduction and constraint handling.
  • Numerical experiments validate that D-SMPL reduces iteration time and improves constraint satisfaction compared to baseline methods.

The Decentralized Stochastic Momentum-based Prox-Linear Algorithm (D-SMPL) addresses the problem of consensus-based decentralized stochastic optimization involving non-convex expected objectives with convex non-smooth regularizers and nonlinear functional inequality constraints. Each agent operates without central coordination, is restricted to querying local stochastic gradient and constraint information, and communicates through neighbor averaging via a doubly stochastic mixing matrix. D-SMPL integrates a prox-linearization of nonlinear constraints, an exact-penalty model for constraint handling, STORM-style momentum for variance reduction, and a two-round consensus-based gradient tracking protocol, achieving provably optimal complexity for this class of decentralized problems (Sharma et al., 28 Jan 2026).

1. Problem Formulation and Exact-Penalty Reformulation

Consider an undirected graph G=(V,E)\mathcal{G}=(\mathcal{V},\mathcal{E}) of nn agents, each with a private stochastic component fi(x)=Eξi[fi(x,ξi)]f_i(x)=\mathbb{E}_{\xi_i}[f_i(x,\xi_i)], a common convex regularizer h(x)h(x) (possibly nonsmooth), and mm shared smooth convex nonlinear constraints gk(x)0g_k(x)\le0 (k=1,,mk=1,\dots,m). The global consensus-optimization task is: minxRdF(x):=1ni=1nfi(x)+h(x)subject togk(x)0  k\min_{x\in\mathbb{R}^d} F(x) := \frac1n\sum_{i=1}^n f_i(x) + h(x) \quad \text{subject to} \quad g_k(x)\le 0 \;\forall k No central node exists; communication is performed via neighbor-averaging defined by a symmetric, doubly-stochastic WW. The problem is recast using an exact-penalty model with a scalar slack variable ν0\nu\ge0 and parameter γ>0\gamma>0: minxRd  {f(x)+h(x)+γmaxk=1,,m[gk(x)]+}\min_{x\in\mathbb{R}^d}\;\Bigl\{f(x)+h(x)+\gamma\max_{k=1,\dots,m}[g_k(x)]_+\Bigr\} This is equivalent to: Fc(x)=minν0  {f(x)+h(x)+γν}s.t. gk(x)ν,  kF_c(x) = \min_{\nu\ge0}\;\Bigl\{f(x)+h(x)+\gamma\nu\Bigr\} \quad\text{s.t. }g_k(x)\le\nu,\;\forall k For γ\gamma sufficiently large and under a strong Slater condition, stationary points of this penalized surrogate correspond to KKT points of the original problem.

2. Algorithmic Workflow

D-SMPL employs local copies of primal iterates (xitx_i^t), momentum estimators (zitz_i^t), and gradient trackers (yity_i^t) at each agent. Each iteration comprises two communication steps (consensus rounds) separated by a local quadratic program (QP) solve and a stochastic gradient update.

Iteration Steps (per agent ii, at step tt):

  1. Prox-linear Subproblem: Solve

minx,ν0yit,x+h(x)+12ηxxit2+γν\min_{x,\nu\ge0} \:\langle y_i^t, x \rangle + h(x) + \frac{1}{2\eta}\|x-x_i^t\|^2 + \gamma \nu

subject to:

gk(xit)+gk(xit),xxitνk=1,,mg_k(x_i^t)+\langle\nabla g_k(x_i^t), x-x_i^t\rangle\le\nu \quad \forall k=1,\dots,m

  1. Consensus (Step 1): Update

xit+1=j=1nWijx~jtx_i^{t+1} = \sum_{j=1}^n W_{ij}\,\tilde{x}_j^t

  1. Momentum-based Gradient Update ("STORM" recursion):

zit+1=fi(xit+1,ξit+1)+(1β)[zitfi(xit,ξit+1)]z_i^{t+1} = \nabla f_i(x_i^{t+1},\xi_i^{t+1}) + (1-\beta)\left[z_i^t - \nabla f_i(x_i^t,\xi_i^{t+1})\right]

  1. Consensus (Step 2, Gradient Tracking):

yit+1=j=1nWijyjt+(zit+1zit)y_i^{t+1} = \sum_{j=1}^n W_{ij}y_j^t + (z_i^{t+1} - z_i^t)

The algorithm outputs a randomly chosen iterate xitx_i^t from t=1,,Tt=1,\ldots,T.

3. Principal Components and Assumptions

3.1 Prox-linear Subproblem Structure

Each per-iteration subproblem is a linearly constrained quadratic program (QP) due to the linearization of the nonlinear gkg_k about xitx_i^t. When hh is piecewise-linear or quadratic (1\ell_1, elastic net, total variation), the QP remains tractable for standard solvers. Warm-starting and exploiting constraint sparsity facilitate efficient subproblem solutions.

3.2 Stochastic Momentum and Gradient Tracking

The recursion for zitz_i^t implements a STORM-style estimator, crucial for variance reduction under stochastic gradients. The two-consensus rounds ensure both average agreement among agents (on xx and yy) and robust tracking of the network-wide gradient estimate, enabling convergence even in fully decentralized and data-heterogeneous scenarios (Mancino-Ball et al., 2022).

3.3 Key Assumptions

  • fif_i are LfL_f-smooth in mean-square gradient; gkg_k are LgL_g-smooth and convex.
  • Gradient noise variance for respective agents satisfies Efi(x,ξi)fi(x)2σi2\mathbb E\|\nabla f_i(x,\xi_i) - \nabla f_i(x)\|^2\le\sigma_i^2.
  • The communication matrix WW is symmetric, doubly stochastic, and has spectral gap λ(0,1)\lambda\in(0,1); ν=(1λ2)1\nu=(1-\lambda^2)^{-1}.
  • Initialization need not be feasible; only bounded initial suboptimality and gradient norms are required.

4. Convergence and Complexity Analysis

4.1 Complexity Bounds

With choices

η=Θ((n2/(ν2σˉ2T))1/3),β=576ν2L2η2n,b0=Θ((nT)1/3),\eta = \Theta((n^2/(\nu^2\bar{\sigma}^2T))^{1/3}),\quad\beta = \frac{576\nu^2 L^2\eta^2}{n},\quad b_0 = \Theta((nT)^{1/3}),

and T=O(ϵ3/2)T=O(\epsilon^{-3/2}), D-SMPL guarantees an ϵ\epsilon-approximate KKT point for the original problem with total stochastic first-order oracle (SFO) calls per agent: O(n(σˉ/ϵ3/2)ν)=O(ϵ3/2)O\bigl(n(\bar{\sigma}/\epsilon^{3/2})\nu\bigr) = O(\epsilon^{-3/2}) matching the optimal rate for unconstrained centralized non-convex stochastic optimization. No inner multi-round averaging is necessary; each iteration requires only two consensus communications.

4.2 Core Analytical Ingredients

  • Consensus and gradient-tracking errors are bounded by the primal step progress δt=x~txt2\delta^t=\|\tilde{x}^t-x^t\|^2.
  • Prox-linear descent follows a three-point inequality ensuring decrease of the penalized objective up to controlled error.
  • Variance in stochastic momentum is managed by balancing η\eta and β\beta.
  • Approximate stationarity and near-feasibility are established via small δt\delta^t and strong Slater-type error bounds.

5. Communication Protocol and Efficiency

Each iteration entails two communication rounds—a first for primal averages (xx) and a second for gradient-tracker averages (yy) across immediate neighbors using the fixed mixing matrix WW. The method achieves O(ϵ3/2)O(\epsilon^{-3/2}) communication complexity per agent, matching its SFO complexity. This approach eliminates the need for nested consensus or inner loops and is robust to network structure, as long as connectivity and requisite spectral conditions are met.

6. Practical Implementation and Comparative Performance

6.1 QP Subproblem Solving

When hh is 1\ell_1, total variation, or similar, the subproblem QP entails only linear constraints, permitting high-performance general-purpose solvers (e.g., OSQP). Warm-start strategies and the typical scenario mdm\ll d minimize solve time. This leads to substantial wall-clock improvements in practice.

6.2 Numerical Experiments

Simulations for energy-optimal ocean trajectory planning (multi-USV navigation under uncertain flow forecasts and formation/speed constraints) demonstrate that D-SMPL and its SCA variant maintain the theoretical O(ϵ3/2)O(\epsilon^{-3/2}) iteration complexity and require $3$–5×5\times less wall-clock time per iteration compared to DEEPSTORM (Mancino-Ball et al., 2022) and D-MSSCA baselines, with comparable or superior final energy and constraint satisfaction. This performance boost is attributed to the reduced cost of linearly constrained QP subproblems, as opposed to full convex subproblems.

7. Connections and Extensions

D-SMPL unifies several concepts: exact-penalty reformulation for constraint handling, prox-linearization for tractable subproblems, STORM/momentum for effective variance reduction (Mancino-Ball et al., 2022), and restricted double-consensus gradient tracking for network robustness. Compared to DEEPSTORM (Mancino-Ball et al., 2022), D-SMPL specifically addresses nonlinear constraint handling and utilizes exact-penalty QP subproblems instead of composite proximal steps. This suggests potential for extensions to time-varying or asynchronous networks, though current analysis presumes static, synchronous communication.

D-SMPL provides an efficient and theoretically optimal framework for decentralized non-convex constrained stochastic optimization with rigorous guarantees on oracle and communication complexity (Sharma et al., 28 Jan 2026).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Decentralized Stochastic Momentum-based Prox-Linear Algorithm (D-SMPL).