Adaptive Riemannian ADMM for Nonsmooth Problems

Updated 23 October 2025

The paper introduces ARADMM, which efficiently solves structured nonsmooth optimization problems on compact Riemannian submanifolds without smoothing the nonsmooth component.
It employs adaptive parameter tuning that integrates proximal operators with Riemannian gradient and retraction methods to guarantee optimal convergence rates (O(ε⁻³)).
Empirical results on sparse PCA and robust subspace recovery demonstrate ARADMM’s superior performance compared to smoothing-based approaches.

Adaptive Riemannian Alternating Direction Method of Multipliers (ARADMM) is a class of algorithms designed to solve structured nonsmooth composite optimization problems constrained to compact Riemannian submanifolds embedded in Euclidean space. ARADMM generalizes classical ADMM by handling both nonsmooth terms and manifold constraints, combining proximal operator techniques, Riemannian geometry (retraction, tangent spaces), and adaptive parameter strategies. The intent is to address applications such as sparse principal component analysis and robust subspace recovery without requiring smooth approximation of the nonsmooth regularizer, while achieving optimal or near-optimal iteration complexity (Deng et al., 21 Oct 2025).

1. Mathematical Formulation and Problem Setting

The canonical problem addressed by ARADMM is

$\min_{x \in \mathcal{M}} f(x) + h(Ax),$

where $f$ is smooth (often Lipschitz gradient), $h$ is a proper, closed, convex but nonsmooth function (e.g., $\ell_1$ norm), $A$ is linear, and $\mathcal{M}$ is a compact Riemannian submanifold (such as the Stiefel manifold or the sphere). This framework encodes both structural constraints (via $\mathcal{M}$ ) and regularization or sparsity priorities (via $h$ ).

To decouple the smooth and nonsmooth components, an auxiliary splitting variable $y$ is introduced, reformulating the problem as

$\min_{x \in \mathcal{M}, y} f(x) + h(y) \quad \text{s.t. } Ax = y.$

The augmented Lagrangian is then

$\mathcal{L}_\rho(x, y, \lambda) = f(x) + h(y) - \langle \lambda, Ax - y \rangle + \frac{\rho}{2}\|Ax - y\|^2.$

This setup allows h to be handled via its proximal operator, and the Riemannian constraint to be managed using geometric optimization methods.

2. Algorithmic Framework and Adaptive Coordination

ARADMM integrates three core iterative components per step (Deng et al., 21 Oct 2025):

y-update: $y_{k+1} = \operatorname{prox}_{h/\rho_k}(Ax_k + \lambda_k/\rho_k)$ , leveraging the proximal operator of $h$ to exactly solve the nonsmooth subproblem;
x-update: $x_{k+1} = \mathscr{R}_{x_k}(-\tau_k \cdot \operatorname{grad} \Phi_k(x_k))$ where $\Phi_k(x) = \mathcal{L}_{\rho_k}(x, y_{k+1}, \lambda_k)$ , $\operatorname{grad}$ denotes the Riemannian gradient at $x_k$ , and $\mathscr{R}_{x_k}$ is a retraction mapping enforcing $x_{k+1}\in\mathcal{M}$ ;
Dual update: $\lambda_{k+1} = \lambda_k - \gamma_{k+1} (A x_{k+1} - y_{k+1})$ , with an adaptive dual stepsize $\gamma_{k+1}$ .

A defining feature is the adaptive selection of penalty and stepsize parameters. The penalty parameter $\rho_k$ is typically chosen to scale with iteration (e.g., $\rho_k = c_\rho k^{1/3}$ ), while the primal stepsize $\tau_k$ is set as $\tau_k = c_\tau k^{-1/3}$ . The dual stepsize $\gamma_{k+1}$ is adapted at every iteration to balance dual progress against constraint violation, directly exploiting observed progress patterns in the sequence.

This adaptive coordination of $(\rho_k, \tau_k, \gamma_{k+1})$ is central to guaranteeing sufficient descent in a suitable potential function, maintaining algorithmic stability and convergence rates, and obviating the need for smooth approximations of $h$ .

3. Complexity Analysis and Theoretical Guarantees

Under mild assumptions (compactness of $\mathcal{M}$ , Lipschitz gradient for $f$ , and proper closed convexity of $h$ ), ARADMM achieves an iteration complexity of $\mathcal{O}(\epsilon^{-3})$ for producing an $\epsilon$ -approximate KKT point (Deng et al., 21 Oct 2025). Specifically, after $\mathcal{O}(\epsilon^{-3})$ iterations, the produced tuple $(x, y, \lambda)$ satisfies the necessary stationarity, feasibility, and dual conditions to within an $\epsilon$ tolerance.

Unlike previous Riemannian ADMM approaches that require smoothing the nonsmooth $h$ (using, for example, Moreau envelopes (Li et al., 2022)), ARADMM achieves this complexity without any such regularization, relying directly on proximal computations. Smoothing-based Riemannian ADMM variants also achieve $\mathcal{O}(\epsilon^{-3})$ complexity, but at the cost of additional parameter selection and approximation error.

Key properties enabling this rate include:

Each iteration performs only one Riemannian gradient/retraction and one proximal step;
The adaptive penalty/stepsize coordination ensures synchrony between primal and dual progress despite the underlying nonconvexity (due to the manifold constraint) and nonsmoothness.

Table: Algorithmic Differences for Riemannian Nonsmooth Optimization

Method	Handles nonsmooth $h$ directly?	Smoothing/Moreau envelope required?	Adaptive parameters?	Iteration complexity
ARADMM (Deng et al., 21 Oct 2025)	Yes	No	Yes	$\mathcal{O}(\epsilon^{-3})$
Riemannian ADMM (Li et al., 2022)	No	Yes	Some (manual tuning)	$\mathcal{O}(\epsilon^{-4})$
Classical ADMM (Euclidean)	Yes	No	Yes (recent)	$\mathcal{O}(\epsilon^{-3})$

Earlier Riemannian ADMM algorithms required smoothing $h$ , increasing the number of parameters (e.g., smoothing parameter $\gamma$ ) and possibly degrading the convergence rate to $\mathcal{O}(\epsilon^{-4})$ (Li et al., 2022). ARADMM avoids this by leveraging adaptive penalty updates and direct use of the proximal map, while still matching or improving the best theoretical rates.

Adaptive strategies in classical Euclidean ADMM—such as spectral stepsize selection, residual balancing, and variable step size updates (Xu et al., 2016, Xu et al., 2017, Bartels et al., 2017, Lorenz et al., 2018, Wang, 17 Apr 2024)—inspire much of the ARADMM parameter updating logic, but extending these ideas to the Riemannian context requires careful handling of curvature, retraction, and non-Euclidean geometry.

5. Applications and Empirical Results

Demonstrated domains include:

Sparse Principal Component Analysis (PCA): Formulated as maximizing a quadratic form over the Stiefel manifold with an $\ell_1$ or group-sparsity inducing $h$ . ARADMM achieves lower objective values and faster convergence—both in iteration and CPU time—compared to state-of-the-art Riemannian ADMM baselines (MADMM, RADMM, OADMM).
Robust Subspace Recovery / Dual Principal Component Pursuit (DPCP): ARADMM again outperforms alternatives, producing lower objective values and requiring fewer iterations to reach high-accuracy feasibility.
In these experiments, ARADMM’s adaptive parameter coordination leads to tighter enforcement of constraints (i.e., $\|Ax - y\|$ is reduced) and better solution quality, empirically confirming the practical utility of the theoretical advances.

6. Extensions and Implementation Considerations

ARADMM is implemented to require only proximal operator access for $h$ , Lipschitz gradient evaluation for $f$ , and retraction/gradient tools for the manifold $\mathcal{M}$ . Theoretical analysis leverages technical ingredients such as bounding variations of multipliers over changing tangent spaces and the retraction smoothness properties.

Potential extensions include:

Further automation of parameter adaptation, drawing on insights from Euclidean spectral stepsize and curvature-based residual balancing, but redefined using Riemannian distances and tangent-space norms (Xu et al., 2017, Xu et al., 2016, Lorenz et al., 2018).
Application to large-scale machine learning models over manifolds with structured nonsmooth regularization—for example, orthogonally-constrained dictionary learning (Li et al., 2022) and distributed statistical estimation—where adaptivity and nonsmooth handling are essential.

7. Significance and Future Directions

ARADMM’s key advance is optimal-order convergence for manifold-constrained nonsmooth composite optimization without smoothing the nonsmooth penalty—a property previously unattained. This distinction is particularly notable for applications where direct handling of sparsity and other structure is crucial and where smoothing would yield biased or suboptimal results.

The adaptive paradigm established in ARADMM is likely to influence future developments in both Riemannian optimization and large-scale nonsmooth learning, especially as practical instantiations increasingly require geometry-aware, structure-exploiting, and parameter-free methodologies. A plausible implication is the emergence of ARADMM as a template for manifold optimization in high-dimensional learning systems where both geometry and nonsmooth structure are prominent (Deng et al., 21 Oct 2025).