Regularized Projective Manifold Gradient (RPMG)

Updated 12 January 2026

RPMG is a framework of Riemannian optimization that constructs manifold-aware gradients and offers closed-form proximal updates for regularized objectives.
It applies to matrix manifolds such as the unit sphere, Stiefel manifold, and SO(3), enabling efficient solutions in tasks like rotation regression and spectral clustering.
The method guarantees convergence under convexity and Lipschitz conditions, integrating ADMM and retraction techniques to maintain geometric consistency and enhance performance.

The Regularized Projective Manifold Gradient (RPMG) framework comprises a class of Riemannian optimization techniques designed for smooth and non-smooth regularized objectives over matrix manifolds such as the unit sphere, Stiefel manifold, and the Lie group SO(3). RPMG addresses the challenge of imposing structure-promoting penalties (e.g., sparsity, boundedness, and low-rankness) while respecting the geometry of the underlying manifold. The essential innovation is constructing manifold-aware gradients, accompanied by closed-form proximal updates or penalty-handling via splitting algorithms, thus providing efficient and scalable solutions in settings ranging from deep neural network rotation regression to regularized spectral clustering.

1. Mathematical Formulation and Problem Setting

RPMG encompasses several problem classes unified by the need to minimize a composite objective $F$ :

On the sphere manifold $S^{n-1}$ (Bai et al., 2022):

$\text{minimize} \quad F(x) = g(x) + h(x), \quad \text{subject to } x \in S^{n-1}$

where $g: \mathbb{R}^n \to \mathbb{R}$ is smooth and $\nabla g$ is Lipschitz on $\{\|x\|_2 \leq 1\}$ , $h$ is convex and absolutely homogeneous (e.g., $\ell_1$ -norm, nuclear norm, nuclear-spectral norm).

On the Stiefel manifold $\mathrm{St}(n,K)$ (Zhai et al., 2024):

$\min_{U^\top U = I_K} F(U) = \|A - UU^\top\|_F^2 + \lambda \sum_{i,j} g((UU^\top)_{ij})$

where $A \in \mathbb{R}^{n \times n}$ is a symmetric affinity matrix, $g$ is a convex, differentiable penalty, and $UU^\top$ is constrained to be a rank- $K$ projection.

For rotation regression over $\mathrm{SO}(3)$ (Chen et al., 2021): RPMG constructs Riemannian backpropagation layers to ensure that network outputs for non-Euclidean targets (rotations, spheres) receive gradients adapted to the manifold structure, with regularizers to maintain norm preservation and step-size control.

2. Manifold-Aware Gradient Construction

RPMG employs Riemannian optimization principles to define gradients and update steps that remain on the manifold:

Tangent Space and Projection: For the sphere, tangent vectors satisfy $x^\top v = 0$ . For the Stiefel manifold, the tangent space at $U$ is $\{ \Delta \mid U^\top \Delta + \Delta^\top U = 0 \}$ . Gradients are projected onto these tangent spaces, e.g., $\mathrm{Proj}_{T_U}(W) = W - U\mathrm{sym}(U^\top W)$ , where $\mathrm{sym}(M) = (M + M^\top) / 2$ .
Cayley Transform and Retraction: For Stiefel, feasible search curves are constructed via the Cayley transform:

$U(\tau) = Q(\tau)U, \quad Q(\tau) = (I + \frac{1}{2}\tau W)^{-1} (I - \frac{1}{2}\tau W)$

with $W$ a skew-symmetric matrix derived from the projected gradient.

Proximal Steps and Penalty Handling: On the sphere, a proxy step-size variable $t'$ enables closed-form updates, with monotone control over the actual step-size and tangent update. For Stiefel, ADMM is employed to decouple entry-wise penalties from projection constraints, admitting proximal minimization for the auxiliary variables (Bai et al., 2022, Zhai et al., 2024).

3. Algorithmic Techniques and Variants

3.1 Proxy Step-Size and Closed-Form Updates on the Sphere

The proxy step-size $t'$ and variable $z$ (proximal update) are linked via:

$z = \mathrm{prox}_{|t'|h}(x_k - t' \nabla g(x_k)), \quad v_k = \frac{z}{x_k^\top z} - x_k, \quad t = \frac{t'}{x_k^\top z}$

$x_{k+1} = \frac{z}{\|z\|}$ maintains unit norm.

Monotonicity and line-search: The map $t' \mapsto t = \varphi(t') = t'/c(t')$ is strictly increasing under convex, absolutely homogeneous $h$ . Line search only requires evaluating $g$ due to a model-based surrogate, ensuring efficient backtracking (Bai et al., 2022).

3.2 ADMM over the Stiefel Manifold

Augmented Lagrangian and splitting:

$\mathcal{L}_\rho(X,Y,\Lambda) = \|A - X\|_F^2 + \lambda \sum_{i,j} g(Y_{ij}) + \frac{\rho}{2}\|X - Y\|_F^2 + \langle \Lambda, X - Y \rangle$

where $X = UU^\top \in \mathrm{Proj}_K$ , $Y$ is auxiliary.

The $X$ -update is projection onto rank- $K$ via eigendecomposition; the $Y$ -update is entrywise proximal for the chosen $g$ (bounded, nonnegative, Huber-sparse). Lag multipliers $\Lambda$ are updated as in standard ADMM (Zhai et al., 2024).

3.3 RPMG for Deep Learning on SO(3) and Other Manifolds

Riemannian gradients are derived for $\mathrm{SO}(3)$ regression:

$\mathrm{Proj}_{T_R}(G) = R\, \mathrm{skew}(R^\top G)$

Steps along the geodesic are mapped back to chosen representation spaces (quaternions, 6D, 9D, 10D), with correction vectors

$g_{\mathrm{RPMG}} = (x - x_{\mathrm{gp}}) + \lambda(x_{\mathrm{gp}} - \hat{x}_g)$

where $x_{\mathrm{gp}}$ is the minimal correction to $x$ aligning with the geodesic step target and $\lambda$ promotes norm stability (Chen et al., 2021).

4. Penalty Functions and Proximal Operators

RPMG frameworks accommodate several classes of structure-inducing penalty functions:

Penalty Type	Mathematical Form	Proximal Operator / Update
$\ell_1$ -norm	$h(x) = \lambda\\|x\\|_1$	Soft-thresholding: $\mathrm{sign}(w_i)\max(\|w_i\| - t\lambda, 0)$
Nuclear norm	$h(x) = \lambda\\|X\\|_*$ , $X = \mathrm{mat}(x)$	Singular value soft-thresh.: $U \mathrm{diag}((\sigma_i - t\lambda)_+ ) V^\top$
Bounded penalty	$g_{\alpha,\beta}(z) = [\min(z-\alpha,0)]^2 + [\min(\beta-z,0)]^2$	Proximal via projection and scaling
Huber sparsity	$g_\delta(z) = \frac{1}{2}z^2/\delta$ if $\|z\| \leq \delta$ , $\|z\|-\delta/2$ otherwise	Piecewise proximal, depends on $\delta$

All cases rely on the proximal operator being available in closed form (Bai et al., 2022, Zhai et al., 2024).

5. Theoretical Guarantees

Convergence: Under convexity, absolute homogeneity, and Lipschitz gradient conditions, RPMG frameworks guarantee that the objective is nonincreasing and that iterates converge to KKT points or stationary points of the manifold-constrained objective. For instance, on the sphere manifold,

$F(x_{k+1}) \leq F(x_k) - \frac{1}{2t}\|v_k\|^2$

with $\|v_k\| \to 0$ , and the critical point satisfies

$0 \in \mathrm{grad} g(x_*) + \mathrm{Proj}_{T_{x_*}S}\partial h(x_*)$

(Bai et al., 2022). With ADMM and Stiefel manifold projection, global convergence to KKT points is ensured under appropriate spectral gap and penalty smoothness (Zhai et al., 2024).

Complexity: Each RPMG iteration typically involves a gradient computation, a proximal or eigendecomposition step, and possibly several line-search or ADMM inner loops. For instance, Stiefel manifold updates require eigendecomposition (cost $O(n^3)$ or $O(nK^2)$ ), proximal operators cost $O(n)$ or $O(\min(p,q)pq)$ for low-rank matrix problems (Zhai et al., 2024, Bai et al., 2022).

6. Empirical Results and Applications

Spectral Clustering and Community Detection: On affinity graph clustering, RPMG combined with Huber-sparsity or bounded entry penalties achieves up to 20% absolute gain in ACC/NMI over spectral, SDP-1/2, and SLSA baselines, on benchmarks such as stochastic block models, handwritten digits, and UCI datasets (Zhai et al., 2024).
Computer Vision Regularization: When applied to computer vision tasks with nuclear and $\ell_1$ norm regularizations over spherical constraints, RPMG demonstrates consistent performance improvements in all cases tested (Bai et al., 2022).
Deep Rotation Regression: Integrating RPMG as a backward-pass layer in SO(3) regression reduces median pose errors by 2–6x compared to standard projection and loss strategies. For example, in ModelNet-40, vanilla 6D representation yields median error 4.67°, while RPMG-6D achieves 2.07° and increases 5°-accuracy from 54% to 93.6%. Key to these gains is the regularization against norm collapse and the use of Riemannian gradient steps along the geodesic (Chen et al., 2021).

7. Practical Considerations and Extensions

Initialization: Empirically, initializing with the rank- $K$ projection of $A$ (top- $K$ eigenvectors) or suitable manifold projection is standard.
Parameter Tuning: For penalties, parameters such as the Huber $\delta$ ( $10^{-3}–10^{-5}$ ), $\lambda$ ($0.1–1$), and ADMM coupling $\rho=3\lambda \ell, \ell=1/\delta$ are typical (Zhai et al., 2024).
Acceleration: Nesterov-style momentum can be incorporated by evaluating the proximal step at an auxiliary point and forming a retraction-based combination.
Extensibility: The RPMG paradigm extends to other matrix manifolds (e.g., spheres $S^2$ ) provided that manifold projections, tangent spaces, retraction maps, and proximal operators are available in closed form (Chen et al., 2021).
Stopping Criteria: Algorithms terminate on reaching $\|X - Y\|_F < 1\text{e}{-6}$ or maximum iterations, and in deep settings, norm stability of representation is a critical diagnostic (Zhai et al., 2024, Chen et al., 2021).

A plausible implication is that the RPMG methodology provides a general framework for solving regularized manifold-constrained optimization problems where standard Euclidean methods would fail to enforce geometric or algebraic constraints intrinsic to the problem domain. The combination of closed-form updates, global convergence, and compatibility with various regularizations underscores RPMG's practical relevance across manifold-aware learning and matrix optimization.