Regularized Dual ADMM for Structured Convex Optimization

Updated 26 November 2025

Regularized Dual ADMM is a method that extends traditional ADMM by integrating explicit regularization to handle composite convex optimization problems with structured constraints.
It employs dual splitting using Bregman divergence, achieving robust convergence guarantees, sparsity, and efficient parallel updates in large-scale settings.
Applications span sparse modeling, structured learning, and motion planning, demonstrating practical speedups and enhanced performance in nonconvex reformulations.

Regularized Dual Alternating Direction Method of Multipliers (RDA)—often referred to in research contexts as Regularized Dual ADMM or, in Bregman generalizations, Bregman ADMM—addresses composite convex optimization and constrained problems by integrating explicit regularization structures into ADMM-type primal-dual splitting schemes. RDA arises both in online convex optimization settings (dual averaging, composite mirror descent, FTRL families) and in large-scale structured problems where dual decompositions and parallelization are essential. It supports rigorous theoretical guarantees and practical efficiency in sparse modeling, structured regularization, distributed computation, and nonconvex reformulations.

1. Mathematical Formulation and Core Principles

RDA accommodates convex minimization objectives subject to linear or structured constraints, with composite regularization terms. Consider the linearly-constrained convex program:

$\min_{u\in\mathbb{R}^p,\, v\in\mathbb{R}^q}\; f(u) + g(v) \quad \text{subject to} \quad M u + N v = b$

where $f$ and $g$ are proper, closed convex functions, and $(M, N, b)$ encode the problem structure. An explicit regularization term (e.g., $\ell_1$ penalty, group structure, KL-divergence) is handled by composite objective definitions.

RDA operates primarily via splitting methods applied to the Fenchel dual, with dual variables representing constraint multipliers. Bregman generalization introduces a Legendre function $h$ to define the Bregman divergence $D_h(x, y) = h(x) - h(y) - \langle\nabla h(y), x - y\rangle$ , with quadratic $h$ yielding Euclidean geometry as a special case (Ma et al., 10 Sep 2025).

2. Algorithmic Frameworks: Primal–Dual Updates and Regularization

The base RDA update, in primal form for a convex domain $\mathcal{X}$ , is:

$x_{t+1} = \arg\min_{x\in\mathcal{X}} \langle G_t, x \rangle + t\lambda \Psi(x) + R_{1:t}(x)$

where $G_t = \sum_{s=1}^t g_s$ is the accumulation of subgradients from previous loss functions, $\Psi(x)$ is the explicit penalty (often $\ell_1$ ), and $R_{1:t}(x)$ is a quadratic or Bregman regularizer (McMahan, 2010).

Generalized to two-block splitting with Bregman regularization, iterates update via:

$\begin{aligned} u^{k+1} &= \arg\min_{u} \left\{ f(u) + \frac{1}{\gamma} h^*\left(\nabla h(w^k) + \gamma(Mu + N v^k - b)\right) \right\} \ v^{k+1} &= \arg\min_{v} \left\{ g(v) + \frac{1}{\gamma} h^*\left(\nabla h(w^k) + \gamma(Mu^{k+1} + N v - b)\right) \right\} \ w^{k+1} &= \nabla h^*\left(\nabla h(w^k) + \gamma(Mu^{k+1} + N v^{k+1} - b)\right) \end{aligned}$

The parameter $\gamma$ serves as the regularization penalty, with its decay rate impacting convergence properties (Ma et al., 10 Sep 2025).

3. Sparsity, Composite Objectives, and Closed-Form Solutions

For composite regularizers (notably $\ell_1$ ), RDA provides coordinate-wise closed-form updates:

$x_{t+1, i} = \operatorname{sign}(-[G_t]_i) \max\{0, |[G_t]_i| - t\lambda\} / (1/\eta_t)$

This exact handling of cumulative regularization results in substantially more sparsity compared to local-linearized approaches (e.g., FOBOS/composite mirror descent), which only penalize via subgradient approximations from the current round. RDA's global inclusion of $t\lambda \|\cdot\|_1$ enforces precise soft-thresholding, driving more coordinates to zero (McMahan, 2010).

Complex structured regularizers (e.g., overlapped group lasso, fused lasso, trace-norm) are managed via a splitting matrix $B$ and transformation into prox-friendly domains, allowing efficient updates for elaborate penalty types (1311.0622).

4. Convergence Analysis, Regret Bounds, and Assumptions

RDA exhibits favorable convergence and regret properties. For online convex optimization, with bounded subgradients $\|g_t\| \leq G$ and domain diameter $D$ , using quadratic rates $\sigma_{1:t}=G\sqrt{2/t}/D$ yields:

$\operatorname{Regret} \leq \frac{1}{2} D G \sqrt{2T} + \frac{GD}{\sqrt{2}} \ln T + O(GD)$

For composite objectives, the logarithmic term persists; for pure regularization (no composite term), the bound reduces to $O(DG\sqrt{T})$ (McMahan, 2010).

With Bregman regularization, convergence analysis assumes:

Strong convexity of $h$
Lipschitz continuity of $\nabla h^*$
Bounded subgradients $\|\partial f\|_\infty, \|\partial g\|_\infty \leq G$
Nonincreasing sequence $\gamma_k \rightarrow 0$

and delivers an $O(1/\sqrt{k})$ sublinear rate in primal objective gap:

$\min_{1 \leq t \leq k}\{f(u^t) + g(v^t) - f(u^*) - g(v^*)\} = O(1/\sqrt{k})$

Special cases (quadratic $h$ , entropy $h$ ) recover classical ADMM, variable-metric ADMM, and exponential-multiplier schemes respectively (Ma et al., 10 Sep 2025).

5. Stochastic and Parallelized RDA Variants

Stochastic Dual Coordinate Ascent with ADMM (SDCA–ADMM) leverages RDA principles in settings with massive data and complex regularization. It partitions variables into sub-batches, applies proximal updates per batch, and achieves linear (exponential) convergence rates in composite Lyapunov metrics under mild strong-convexity/smoothness assumptions. Sub-batching offers 2–3× acceleration over pure batch methods, with memory usage scaling as $O(p+n)$ (1311.0622).

In nonconvex and bi-convex problems—particularly in MPC-based motion planning—RDA enables decomposition into primal and obstacle-specific dual blocks, so that all M collision constraints are solved in parallel per MPC step. This parallel structure renders the overall method highly scalable; practical implementations demonstrate near-constant computation time as the number of obstacles increases (Han et al., 2022).

6. Applications: Structured Learning, Optimal Transport, and Motion Planning

RDA finds applications in high-dimensional sparse regression, group lasso, graph-guided fused lasso, trace-norm penalization, and optimal transport (1311.0622, Ma et al., 10 Sep 2025). The Bregman ADMM and exponential-multiplier specializations support entropy-regularized transport and other compositional domains.

In autonomous navigation, RDA delivers accelerated collision-free motion planning by reformulating nonconvex MPC constraints as smooth bi-convex programs amenable to dual splitting and parallel computation per obstacle. The practical impact includes:

Real-time planning with Ackermann kinematics and non-point-mass shapes
Adaptive clearance margins using dynamic safety distance vectors with $\ell_1$ regularization
Empirical speedups (2–3× over interior-point benchmarks), increased robustness (95% success rate versus 80% for TEB), and reduced solution times (Han et al., 2022).

7. Unifying View: FTRL, Mirror Descent, and ADMM

RDA, FTRL-Proximal, and composite-objective mirror descent (COMID, FOBOS) share a common template:

$x_{t+1} = \arg\min_x \left\{ \langle g'_{1:t-1}, x \rangle + f_t(x) + \alpha_{1:t} \Psi(x) + R_{1:t}(x) \right\}$

where choices of stabilization (quadratic, Bregman), regularizer accumulation (global vs. local), and subgradient evaluation (implicit vs. explicit) determine the specific algorithmic instance (McMahan, 2010). Mirror descent arises with Bregman sum regularization and linearized loss, further illustrating the breadth of the RDA framework in first-order online and batch settings.

In summary, Regularized Dual Alternating Direction Method of Multipliers constitutes a versatile and theoretically rigorous architecture for solving high-dimensional, structured, and constrained optimization problems with explicit regularization. Its adaptability to various splitting schemes, regularizer forms, and parallelized or stochastic updates makes it central in modern algorithmic convex optimization and motion planning.