Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 44 tok/s Pro
GPT-5 Medium 20 tok/s Pro
GPT-5 High 31 tok/s Pro
GPT-4o 100 tok/s Pro
Kimi K2 177 tok/s Pro
GPT OSS 120B 434 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

Adaptive Riemannian ADMM for Nonsmooth Problems

Updated 23 October 2025
  • The paper introduces ARADMM, which efficiently solves structured nonsmooth optimization problems on compact Riemannian submanifolds without smoothing the nonsmooth component.
  • It employs adaptive parameter tuning that integrates proximal operators with Riemannian gradient and retraction methods to guarantee optimal convergence rates (O(ε⁻³)).
  • Empirical results on sparse PCA and robust subspace recovery demonstrate ARADMM’s superior performance compared to smoothing-based approaches.

Adaptive Riemannian Alternating Direction Method of Multipliers (ARADMM) is a class of algorithms designed to solve structured nonsmooth composite optimization problems constrained to compact Riemannian submanifolds embedded in Euclidean space. ARADMM generalizes classical ADMM by handling both nonsmooth terms and manifold constraints, combining proximal operator techniques, Riemannian geometry (retraction, tangent spaces), and adaptive parameter strategies. The intent is to address applications such as sparse principal component analysis and robust subspace recovery without requiring smooth approximation of the nonsmooth regularizer, while achieving optimal or near-optimal iteration complexity (Deng et al., 21 Oct 2025).

1. Mathematical Formulation and Problem Setting

The canonical problem addressed by ARADMM is

minxMf(x)+h(Ax),\min_{x \in \mathcal{M}} f(x) + h(Ax),

where ff is smooth (often Lipschitz gradient), hh is a proper, closed, convex but nonsmooth function (e.g., 1\ell_1 norm), AA is linear, and M\mathcal{M} is a compact Riemannian submanifold (such as the Stiefel manifold or the sphere). This framework encodes both structural constraints (via M\mathcal{M}) and regularization or sparsity priorities (via hh).

To decouple the smooth and nonsmooth components, an auxiliary splitting variable yy is introduced, reformulating the problem as

minxM,yf(x)+h(y)s.t. Ax=y.\min_{x \in \mathcal{M}, y} f(x) + h(y) \quad \text{s.t. } Ax = y.

The augmented Lagrangian is then

Lρ(x,y,λ)=f(x)+h(y)λ,Axy+ρ2Axy2.\mathcal{L}_\rho(x, y, \lambda) = f(x) + h(y) - \langle \lambda, Ax - y \rangle + \frac{\rho}{2}\|Ax - y\|^2.

This setup allows h to be handled via its proximal operator, and the Riemannian constraint to be managed using geometric optimization methods.

2. Algorithmic Framework and Adaptive Coordination

ARADMM integrates three core iterative components per step (Deng et al., 21 Oct 2025):

  • y-update: yk+1=proxh/ρk(Axk+λk/ρk)y_{k+1} = \operatorname{prox}_{h/\rho_k}(Ax_k + \lambda_k/\rho_k), leveraging the proximal operator of hh to exactly solve the nonsmooth subproblem;
  • x-update: xk+1=Rxk(τkgradΦk(xk))x_{k+1} = \mathscr{R}_{x_k}(-\tau_k \cdot \operatorname{grad} \Phi_k(x_k)) where Φk(x)=Lρk(x,yk+1,λk)\Phi_k(x) = \mathcal{L}_{\rho_k}(x, y_{k+1}, \lambda_k), grad\operatorname{grad} denotes the Riemannian gradient at xkx_k, and Rxk\mathscr{R}_{x_k} is a retraction mapping enforcing xk+1Mx_{k+1}\in\mathcal{M};
  • Dual update: λk+1=λkγk+1(Axk+1yk+1)\lambda_{k+1} = \lambda_k - \gamma_{k+1} (A x_{k+1} - y_{k+1}), with an adaptive dual stepsize γk+1\gamma_{k+1}.

A defining feature is the adaptive selection of penalty and stepsize parameters. The penalty parameter ρk\rho_k is typically chosen to scale with iteration (e.g., ρk=cρk1/3\rho_k = c_\rho k^{1/3}), while the primal stepsize τk\tau_k is set as τk=cτk1/3\tau_k = c_\tau k^{-1/3}. The dual stepsize γk+1\gamma_{k+1} is adapted at every iteration to balance dual progress against constraint violation, directly exploiting observed progress patterns in the sequence.

This adaptive coordination of (ρk,τk,γk+1)(\rho_k, \tau_k, \gamma_{k+1}) is central to guaranteeing sufficient descent in a suitable potential function, maintaining algorithmic stability and convergence rates, and obviating the need for smooth approximations of hh.

3. Complexity Analysis and Theoretical Guarantees

Under mild assumptions (compactness of M\mathcal{M}, Lipschitz gradient for ff, and proper closed convexity of hh), ARADMM achieves an iteration complexity of O(ϵ3)\mathcal{O}(\epsilon^{-3}) for producing an ϵ\epsilon-approximate KKT point (Deng et al., 21 Oct 2025). Specifically, after O(ϵ3)\mathcal{O}(\epsilon^{-3}) iterations, the produced tuple (x,y,λ)(x, y, \lambda) satisfies the necessary stationarity, feasibility, and dual conditions to within an ϵ\epsilon tolerance.

Unlike previous Riemannian ADMM approaches that require smoothing the nonsmooth hh (using, for example, Moreau envelopes (Li et al., 2022)), ARADMM achieves this complexity without any such regularization, relying directly on proximal computations. Smoothing-based Riemannian ADMM variants also achieve O(ϵ3)\mathcal{O}(\epsilon^{-3}) complexity, but at the cost of additional parameter selection and approximation error.

Key properties enabling this rate include:

  • Each iteration performs only one Riemannian gradient/retraction and one proximal step;
  • The adaptive penalty/stepsize coordination ensures synchrony between primal and dual progress despite the underlying nonconvexity (due to the manifold constraint) and nonsmoothness.

Table: Algorithmic Differences for Riemannian Nonsmooth Optimization

Method Handles nonsmooth hh directly? Smoothing/Moreau envelope required? Adaptive parameters? Iteration complexity
ARADMM (Deng et al., 21 Oct 2025) Yes No Yes O(ϵ3)\mathcal{O}(\epsilon^{-3})
Riemannian ADMM (Li et al., 2022) No Yes Some (manual tuning) O(ϵ4)\mathcal{O}(\epsilon^{-4})
Classical ADMM (Euclidean) Yes No Yes (recent) O(ϵ3)\mathcal{O}(\epsilon^{-3})

Earlier Riemannian ADMM algorithms required smoothing hh, increasing the number of parameters (e.g., smoothing parameter γ\gamma) and possibly degrading the convergence rate to O(ϵ4)\mathcal{O}(\epsilon^{-4}) (Li et al., 2022). ARADMM avoids this by leveraging adaptive penalty updates and direct use of the proximal map, while still matching or improving the best theoretical rates.

Adaptive strategies in classical Euclidean ADMM—such as spectral stepsize selection, residual balancing, and variable step size updates (Xu et al., 2016, Xu et al., 2017, Bartels et al., 2017, Lorenz et al., 2018, Wang, 17 Apr 2024)—inspire much of the ARADMM parameter updating logic, but extending these ideas to the Riemannian context requires careful handling of curvature, retraction, and non-Euclidean geometry.

5. Applications and Empirical Results

Demonstrated domains include:

  • Sparse Principal Component Analysis (PCA): Formulated as maximizing a quadratic form over the Stiefel manifold with an 1\ell_1 or group-sparsity inducing hh. ARADMM achieves lower objective values and faster convergence—both in iteration and CPU time—compared to state-of-the-art Riemannian ADMM baselines (MADMM, RADMM, OADMM).
  • Robust Subspace Recovery / Dual Principal Component Pursuit (DPCP): ARADMM again outperforms alternatives, producing lower objective values and requiring fewer iterations to reach high-accuracy feasibility.
  • In these experiments, ARADMM’s adaptive parameter coordination leads to tighter enforcement of constraints (i.e., Axy\|Ax - y\| is reduced) and better solution quality, empirically confirming the practical utility of the theoretical advances.

6. Extensions and Implementation Considerations

ARADMM is implemented to require only proximal operator access for hh, Lipschitz gradient evaluation for ff, and retraction/gradient tools for the manifold M\mathcal{M}. Theoretical analysis leverages technical ingredients such as bounding variations of multipliers over changing tangent spaces and the retraction smoothness properties.

Potential extensions include:

  • Further automation of parameter adaptation, drawing on insights from Euclidean spectral stepsize and curvature-based residual balancing, but redefined using Riemannian distances and tangent-space norms (Xu et al., 2017, Xu et al., 2016, Lorenz et al., 2018).
  • Application to large-scale machine learning models over manifolds with structured nonsmooth regularization—for example, orthogonally-constrained dictionary learning (Li et al., 2022) and distributed statistical estimation—where adaptivity and nonsmooth handling are essential.

7. Significance and Future Directions

ARADMM’s key advance is optimal-order convergence for manifold-constrained nonsmooth composite optimization without smoothing the nonsmooth penalty—a property previously unattained. This distinction is particularly notable for applications where direct handling of sparsity and other structure is crucial and where smoothing would yield biased or suboptimal results.

The adaptive paradigm established in ARADMM is likely to influence future developments in both Riemannian optimization and large-scale nonsmooth learning, especially as practical instantiations increasingly require geometry-aware, structure-exploiting, and parameter-free methodologies. A plausible implication is the emergence of ARADMM as a template for manifold optimization in high-dimensional learning systems where both geometry and nonsmooth structure are prominent (Deng et al., 21 Oct 2025).

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Adaptive Riemannian Alternating Direction Method of Multipliers (ARADMM).