Papers
Topics
Authors
Recent
Search
2000 character limit reached

Riemannian Sharpness-Aware Minimization (RSAM)

Updated 21 April 2026
  • RSAM is a family of optimization techniques that generalizes SAM by integrating intrinsic Riemannian geometry into the loss minimization process.
  • It utilizes differential geometry tools such as retraction, projection, and intrinsic metrics to guide adversarial perturbations on manifold-constrained spaces.
  • RSAM achieves enhanced generalization and stability with minimal computational overhead, yielding flatter minima and improved performance on benchmarks.

Riemannian Sharpness-Aware Minimization (RSAM) encompasses a family of methodologies generalizing sharpness-aware optimization by incorporating notions of local or intrinsic geometry, particularly Riemannian metrics and manifolds, into the minimization of loss sharpness. These advances address the limitations of Euclidean, parameterization-dependent sharpness-aware methods, and provide frameworks suited for constrained or geometrically structured parameter spaces. Two main strands have emerged in recent literature: geometric reparameterization-invariant sharpness-aware minimization, and intrinsic manifold-constrained RSAM.

1. Objectives and Conceptual Framework

RSAM extends the conventional Sharpness-Aware Minimization (SAM)—which minimizes the worst-case loss within a neighborhood of parameters—by reformulating neighborhoods, distances, and gradients using Riemannian geometry instead of Euclidean space. This yields methods that optimize parameters constrained to Riemannian manifolds or adapt the adversarial perturbation direction using a Riemannian metric intrinsically tied to the loss geometry.

Given a manifold MRk\mathcal{M}\subset\mathbb{R}^k with an intrinsic (often problem-induced) dimension dd, and empirical risk LS(θ)=1ni=1n(fθ(xi),yi)L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i), the generic RSAM objective is

minθM  maxθBθ(ρ)LS(θ)\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')

with Bθ(ρ)\mathcal{B}_\theta(\rho) the Riemannian ball of radius ρ\rho centered at θ\theta (Truong et al., 2023).

2. Geometric Foundations

RSAM requires tools from differential geometry:

  • Riemannian Metric: Inner product ,θ\langle\cdot,\cdot\rangle_\theta defined on tangent space TθMT_\theta\mathcal{M} at each θ\theta.
  • Retraction: Map dd0 approximating exponential map, with dd1 and dd2.
  • Projection: For embedded submanifolds, the ambient Euclidean gradient dd3 is projected to dd4 to yield the Riemannian gradient dd5 via an orthogonal projection (Truong et al., 2023).

This geometric machinery defines neighborhoods and gradient-based optimization steps. For statistical learning on quotient manifolds or the Stiefel manifold (e.g., orthogonality constraints on weight matrices), these operations are realized in closed form (e.g., QR retraction, symmetric projection).

3. Algorithmic Procedures

RSAM algorithms generally decompose each iteration into

  • Inner maximization: ("teleportation" step) Find the worst-case perturbation within a Riemannian ball by first-order Taylor expansion, leading to choosing the maximal-ascent direction of the Riemannian gradient and mapping it back to the manifold via retraction:

dd6

followed by dd7, where dd8, dd9 encodes local metric scaling.

  • Outer update: ("sharpness-aware descent") Apply gradient descent to the perturbed point, mapped back via retraction:

LS(θ)=1ni=1n(fθ(xi),yi)L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)0

(Truong et al., 2023).

In Riemannian parameterizations induced by the loss geometry (as in Monge SAM (Jacobsen et al., 12 Feb 2025)), the metric is LS(θ)=1ni=1n(fθ(xi),yi)L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)1, making both the perturbation norm and direction adapt to the local slope.

4. Reparameterization-Invariant and Loss-Induced (Monge) Metrics

A key RSAM approach is to define the Riemannian metric from the embedding of the parameter manifold into the loss surface, yielding reparameterization invariance. The pullback metric is

LS(θ)=1ni=1n(fθ(xi),yi)L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)2

with the adversarial step

LS(θ)=1ni=1n(fθ(xi),yi)L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)3

This approach, termed Monge SAM (M-SAM), interpolates smoothly between SAM and vanilla gradient descent (GD): as LS(θ)=1ni=1n(fθ(xi),yi)L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)4, LS(θ)=1ni=1n(fθ(xi),yi)L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)5, recovering SAM; as LS(θ)=1ni=1n(fθ(xi),yi)L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)6, the step vanishes, recovering GD (Jacobsen et al., 12 Feb 2025). This metric yields a closed-form, invariant adversarial step and markedly improves robustness to hyperparameters as well as saddle-point escape properties.

5. Theoretical Guarantees

Theoretical analyses of RSAM provide generalization bounds leveraging Riemannian neighborhoods and PAC-Bayes concentration, typically tightening the dependence on the parameter space's intrinsic dimension LS(θ)=1ni=1n(fθ(xi),yi)L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)7 rather than the ambient parameter count LS(θ)=1ni=1n(fθ(xi),yi)L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)8. Specifically, for RSAM with retraction LS(θ)=1ni=1n(fθ(xi),yi)L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)9 and ball radius minθM  maxθBθ(ρ)LS(θ)\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')0: minθM  maxθBθ(ρ)LS(θ)\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')1 with minθM  maxθBθ(ρ)LS(θ)\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')2 reflecting geometric aspects of the retraction (e.g., minθM  maxθBθ(ρ)LS(θ)\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')3 for Stiefel/QR) (Truong et al., 2023).

For Monge SAM, invariance under reparameterization holds formally: for every smooth chart minθM  maxθBθ(ρ)LS(θ)\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')4, the pullback metric guarantees that the adversarial step and corresponding descent trajectory commute with coordinate changes (Jacobsen et al., 12 Feb 2025).

6. Practical Implementations and Empirical Evaluation

RSAM algorithms:

  • Maintain the computational cost of SAM (two forward/backward passes), with modest additional overhead (typically minθM  maxθBθ(ρ)LS(θ)\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')5), even when enforcing geometric constraints such as orthogonality (Truong et al., 2023, Jacobsen et al., 12 Feb 2025).
  • Require essentially the same hyperparameters as Euclidean SAM: perturbation radius minθM  maxθBθ(ρ)LS(θ)\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')6, learning rate minθM  maxθBθ(ρ)LS(θ)\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')7.
  • Induce significantly improved generalization and train–test stability, including flatter minima (as indicated by smaller Hessian eigenvalues relative to SAM), larger test performance gains (minθM  maxθBθ(ρ)LS(θ)\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')8–minθM  maxθBθ(ρ)LS(θ)\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')9 in supervised classification, Bθ(ρ)\mathcal{B}_\theta(\rho)0–Bθ(ρ)\mathcal{B}_\theta(\rho)1 in contrastive learning on vision benchmarks), and greater robustness to hyperparameter misspecification.

Empirical protocols utilize benchmarks such as CIFAR-10, CIFAR-100, FGVCAircraft, and architectures including ResNet-34/50, constrained via Stiefel manifold projections.

Model/Task CE+SGD CE+SAM CE+RSAM SupCon+SGD SupCon+SAM SupCon+RSAM
CIFAR-10 (ResNet-34)
CIFAR-100 +1–3% +1–3% +1–3% +1–5% +1–5% +1–5%

Observed sharpness of RSAM-trained models, as measured by the maximal Hessian eigenvalue, is consistently reduced relative to SAM (e.g., Bθ(ρ)\mathcal{B}_\theta(\rho)2 vs Bθ(ρ)\mathcal{B}_\theta(\rho)3) (Truong et al., 2023).

7. Connections, Limitations, and Ongoing Directions

RSAM provides a unifying geometric perspective on sharpness-aware training, generalizing previous approaches in loss-induced (Monge) Riemannian metrics (Jacobsen et al., 12 Feb 2025), manifold-constrained learning (Truong et al., 2023), and the pursuit of reparameterization invariance.

Notably, the term "RSAM" has been used in Euclidean, randomized smoothing (random-SAM) contexts as well (Khanh et al., 2024); however, those variants do not employ Riemannian geometry but rather exploit stochastic perturbation in Euclidean space.

Open research avenues include extending RSAM to more complex quotient geometries, exploring spectral regularization (e.g., Rényi sharpness), and applying RSAM in domains with inherent geometric constraints (e.g., equivariant networks, structured parameterizations).

In summary, Riemannian Sharpness-Aware Minimization advances sharpness-aware optimization by adapting adversarial perturbations and robust descent steps to the intrinsic geometry of the parameter space or loss landscape, yielding improved generalization, stability, and invariance properties compared to classical approaches (Truong et al., 2023, Jacobsen et al., 12 Feb 2025).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Riemannian Sharpness-Aware Minimization (RSAM).