Decentralized Stochastic Momentum Prox-Linear
- The paper demonstrates that D-SMPL integrates exact-penalty reformulation with prox-linearization and STORM momentum to achieve provably optimal oracle complexity.
- It employs a two-round consensus gradient tracking protocol, ensuring robust decentralized convergence through effective variance reduction and constraint handling.
- Numerical experiments validate that D-SMPL reduces iteration time and improves constraint satisfaction compared to baseline methods.
The Decentralized Stochastic Momentum-based Prox-Linear Algorithm (D-SMPL) addresses the problem of consensus-based decentralized stochastic optimization involving non-convex expected objectives with convex non-smooth regularizers and nonlinear functional inequality constraints. Each agent operates without central coordination, is restricted to querying local stochastic gradient and constraint information, and communicates through neighbor averaging via a doubly stochastic mixing matrix. D-SMPL integrates a prox-linearization of nonlinear constraints, an exact-penalty model for constraint handling, STORM-style momentum for variance reduction, and a two-round consensus-based gradient tracking protocol, achieving provably optimal complexity for this class of decentralized problems (Sharma et al., 28 Jan 2026).
1. Problem Formulation and Exact-Penalty Reformulation
Consider an undirected graph of agents, each with a private stochastic component , a common convex regularizer (possibly nonsmooth), and shared smooth convex nonlinear constraints (). The global consensus-optimization task is: No central node exists; communication is performed via neighbor-averaging defined by a symmetric, doubly-stochastic . The problem is recast using an exact-penalty model with a scalar slack variable and parameter : This is equivalent to: For sufficiently large and under a strong Slater condition, stationary points of this penalized surrogate correspond to KKT points of the original problem.
2. Algorithmic Workflow
D-SMPL employs local copies of primal iterates (), momentum estimators (), and gradient trackers () at each agent. Each iteration comprises two communication steps (consensus rounds) separated by a local quadratic program (QP) solve and a stochastic gradient update.
Iteration Steps (per agent , at step ):
- Prox-linear Subproblem: Solve
subject to:
- Consensus (Step 1): Update
- Momentum-based Gradient Update ("STORM" recursion):
- Consensus (Step 2, Gradient Tracking):
The algorithm outputs a randomly chosen iterate from .
3. Principal Components and Assumptions
3.1 Prox-linear Subproblem Structure
Each per-iteration subproblem is a linearly constrained quadratic program (QP) due to the linearization of the nonlinear about . When is piecewise-linear or quadratic (, elastic net, total variation), the QP remains tractable for standard solvers. Warm-starting and exploiting constraint sparsity facilitate efficient subproblem solutions.
3.2 Stochastic Momentum and Gradient Tracking
The recursion for implements a STORM-style estimator, crucial for variance reduction under stochastic gradients. The two-consensus rounds ensure both average agreement among agents (on and ) and robust tracking of the network-wide gradient estimate, enabling convergence even in fully decentralized and data-heterogeneous scenarios (Mancino-Ball et al., 2022).
3.3 Key Assumptions
- are -smooth in mean-square gradient; are -smooth and convex.
- Gradient noise variance for respective agents satisfies .
- The communication matrix is symmetric, doubly stochastic, and has spectral gap ; .
- Initialization need not be feasible; only bounded initial suboptimality and gradient norms are required.
4. Convergence and Complexity Analysis
4.1 Complexity Bounds
With choices
and , D-SMPL guarantees an -approximate KKT point for the original problem with total stochastic first-order oracle (SFO) calls per agent: matching the optimal rate for unconstrained centralized non-convex stochastic optimization. No inner multi-round averaging is necessary; each iteration requires only two consensus communications.
4.2 Core Analytical Ingredients
- Consensus and gradient-tracking errors are bounded by the primal step progress .
- Prox-linear descent follows a three-point inequality ensuring decrease of the penalized objective up to controlled error.
- Variance in stochastic momentum is managed by balancing and .
- Approximate stationarity and near-feasibility are established via small and strong Slater-type error bounds.
5. Communication Protocol and Efficiency
Each iteration entails two communication rounds—a first for primal averages () and a second for gradient-tracker averages () across immediate neighbors using the fixed mixing matrix . The method achieves communication complexity per agent, matching its SFO complexity. This approach eliminates the need for nested consensus or inner loops and is robust to network structure, as long as connectivity and requisite spectral conditions are met.
6. Practical Implementation and Comparative Performance
6.1 QP Subproblem Solving
When is , total variation, or similar, the subproblem QP entails only linear constraints, permitting high-performance general-purpose solvers (e.g., OSQP). Warm-start strategies and the typical scenario minimize solve time. This leads to substantial wall-clock improvements in practice.
6.2 Numerical Experiments
Simulations for energy-optimal ocean trajectory planning (multi-USV navigation under uncertain flow forecasts and formation/speed constraints) demonstrate that D-SMPL and its SCA variant maintain the theoretical iteration complexity and require $3$– less wall-clock time per iteration compared to DEEPSTORM (Mancino-Ball et al., 2022) and D-MSSCA baselines, with comparable or superior final energy and constraint satisfaction. This performance boost is attributed to the reduced cost of linearly constrained QP subproblems, as opposed to full convex subproblems.
7. Connections and Extensions
D-SMPL unifies several concepts: exact-penalty reformulation for constraint handling, prox-linearization for tractable subproblems, STORM/momentum for effective variance reduction (Mancino-Ball et al., 2022), and restricted double-consensus gradient tracking for network robustness. Compared to DEEPSTORM (Mancino-Ball et al., 2022), D-SMPL specifically addresses nonlinear constraint handling and utilizes exact-penalty QP subproblems instead of composite proximal steps. This suggests potential for extensions to time-varying or asynchronous networks, though current analysis presumes static, synchronous communication.
D-SMPL provides an efficient and theoretically optimal framework for decentralized non-convex constrained stochastic optimization with rigorous guarantees on oracle and communication complexity (Sharma et al., 28 Jan 2026).