Riemannian Sharpness-Aware Minimization (RSAM)

Updated 21 April 2026

RSAM is a family of optimization techniques that generalizes SAM by integrating intrinsic Riemannian geometry into the loss minimization process.
It utilizes differential geometry tools such as retraction, projection, and intrinsic metrics to guide adversarial perturbations on manifold-constrained spaces.
RSAM achieves enhanced generalization and stability with minimal computational overhead, yielding flatter minima and improved performance on benchmarks.

Riemannian Sharpness-Aware Minimization (RSAM) encompasses a family of methodologies generalizing sharpness-aware optimization by incorporating notions of local or intrinsic geometry, particularly Riemannian metrics and manifolds, into the minimization of loss sharpness. These advances address the limitations of Euclidean, parameterization-dependent sharpness-aware methods, and provide frameworks suited for constrained or geometrically structured parameter spaces. Two main strands have emerged in recent literature: geometric reparameterization-invariant sharpness-aware minimization, and intrinsic manifold-constrained RSAM.

1. Objectives and Conceptual Framework

RSAM extends the conventional Sharpness-Aware Minimization (SAM)—which minimizes the worst-case loss within a neighborhood of parameters—by reformulating neighborhoods, distances, and gradients using Riemannian geometry instead of Euclidean space. This yields methods that optimize parameters constrained to Riemannian manifolds or adapt the adversarial perturbation direction using a Riemannian metric intrinsically tied to the loss geometry.

Given a manifold $\mathcal{M}\subset\mathbb{R}^k$ with an intrinsic (often problem-induced) dimension $d$ , and empirical risk $L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)$ , the generic RSAM objective is

$\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')$

with $\mathcal{B}_\theta(\rho)$ the Riemannian ball of radius $\rho$ centered at $\theta$ (Truong et al., 2023).

2. Geometric Foundations

RSAM requires tools from differential geometry:

Riemannian Metric: Inner product $\langle\cdot,\cdot\rangle_\theta$ defined on tangent space $T_\theta\mathcal{M}$ at each $\theta$ .
Retraction: Map $d$ 0 approximating exponential map, with $d$ 1 and $d$ 2.
Projection: For embedded submanifolds, the ambient Euclidean gradient $d$ 3 is projected to $d$ 4 to yield the Riemannian gradient $d$ 5 via an orthogonal projection (Truong et al., 2023).

This geometric machinery defines neighborhoods and gradient-based optimization steps. For statistical learning on quotient manifolds or the Stiefel manifold (e.g., orthogonality constraints on weight matrices), these operations are realized in closed form (e.g., QR retraction, symmetric projection).

3. Algorithmic Procedures

RSAM algorithms generally decompose each iteration into

Inner maximization: ("teleportation" step) Find the worst-case perturbation within a Riemannian ball by first-order Taylor expansion, leading to choosing the maximal-ascent direction of the Riemannian gradient and mapping it back to the manifold via retraction:

$d$ 6

followed by $d$ 7, where $d$ 8, $d$ 9 encodes local metric scaling.

Outer update: ("sharpness-aware descent") Apply gradient descent to the perturbed point, mapped back via retraction:

$L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)$ 0

(Truong et al., 2023).

In Riemannian parameterizations induced by the loss geometry (as in Monge SAM (Jacobsen et al., 12 Feb 2025)), the metric is $L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)$ 1, making both the perturbation norm and direction adapt to the local slope.

4. Reparameterization-Invariant and Loss-Induced (Monge) Metrics

A key RSAM approach is to define the Riemannian metric from the embedding of the parameter manifold into the loss surface, yielding reparameterization invariance. The pullback metric is

$L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)$ 2

with the adversarial step

$L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)$ 3

This approach, termed Monge SAM (M-SAM), interpolates smoothly between SAM and vanilla gradient descent (GD): as $L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)$ 4, $L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)$ 5, recovering SAM; as $L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)$ 6, the step vanishes, recovering GD (Jacobsen et al., 12 Feb 2025). This metric yields a closed-form, invariant adversarial step and markedly improves robustness to hyperparameters as well as saddle-point escape properties.

5. Theoretical Guarantees

Theoretical analyses of RSAM provide generalization bounds leveraging Riemannian neighborhoods and PAC-Bayes concentration, typically tightening the dependence on the parameter space's intrinsic dimension $L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)$ 7 rather than the ambient parameter count $L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)$ 8. Specifically, for RSAM with retraction $L_\mathcal{S}(\theta) = \frac{1}{n} \sum_{i=1}^n \ell(f_\theta(x_i),y_i)$ 9 and ball radius $\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')$ 0: $\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')$ 1 with $\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')$ 2 reflecting geometric aspects of the retraction (e.g., $\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')$ 3 for Stiefel/QR) (Truong et al., 2023).

For Monge SAM, invariance under reparameterization holds formally: for every smooth chart $\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')$ 4, the pullback metric guarantees that the adversarial step and corresponding descent trajectory commute with coordinate changes (Jacobsen et al., 12 Feb 2025).

6. Practical Implementations and Empirical Evaluation

RSAM algorithms:

Maintain the computational cost of SAM (two forward/backward passes), with modest additional overhead (typically $\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')$ 5), even when enforcing geometric constraints such as orthogonality (Truong et al., 2023, Jacobsen et al., 12 Feb 2025).
Require essentially the same hyperparameters as Euclidean SAM: perturbation radius $\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')$ 6, learning rate $\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')$ 7.
Induce significantly improved generalization and train–test stability, including flatter minima (as indicated by smaller Hessian eigenvalues relative to SAM), larger test performance gains ( $\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')$ 8– $\min_{\theta\in\mathcal{M}} \; \max_{\theta'\in\mathcal{B}_\theta(\rho)} L_\mathcal{S}(\theta')$ 9 in supervised classification, $\mathcal{B}_\theta(\rho)$ 0– $\mathcal{B}_\theta(\rho)$ 1 in contrastive learning on vision benchmarks), and greater robustness to hyperparameter misspecification.

Empirical protocols utilize benchmarks such as CIFAR-10, CIFAR-100, FGVCAircraft, and architectures including ResNet-34/50, constrained via Stiefel manifold projections.

Model/Task	CE+SGD	CE+SAM	CE+RSAM	SupCon+SGD	SupCon+SAM	SupCon+RSAM
CIFAR-10 (ResNet-34)	–	–	–	–	–	–
CIFAR-100	+1–3%	+1–3%	+1–3%	+1–5%	+1–5%	+1–5%

Observed sharpness of RSAM-trained models, as measured by the maximal Hessian eigenvalue, is consistently reduced relative to SAM (e.g., $\mathcal{B}_\theta(\rho)$ 2 vs $\mathcal{B}_\theta(\rho)$ 3) (Truong et al., 2023).

7. Connections, Limitations, and Ongoing Directions

RSAM provides a unifying geometric perspective on sharpness-aware training, generalizing previous approaches in loss-induced (Monge) Riemannian metrics (Jacobsen et al., 12 Feb 2025), manifold-constrained learning (Truong et al., 2023), and the pursuit of reparameterization invariance.

Notably, the term "RSAM" has been used in Euclidean, randomized smoothing (random-SAM) contexts as well (Khanh et al., 2024); however, those variants do not employ Riemannian geometry but rather exploit stochastic perturbation in Euclidean space.

Open research avenues include extending RSAM to more complex quotient geometries, exploring spectral regularization (e.g., Rényi sharpness), and applying RSAM in domains with inherent geometric constraints (e.g., equivariant networks, structured parameterizations).

In summary, Riemannian Sharpness-Aware Minimization advances sharpness-aware optimization by adapting adversarial perturbations and robust descent steps to the intrinsic geometry of the parameter space or loss landscape, yielding improved generalization, stability, and invariance properties compared to classical approaches (Truong et al., 2023, Jacobsen et al., 12 Feb 2025).

Markdown Report Issue Upgrade to Chat

References (3)

Sharpness-Aware Teleportation on Riemannian Manifolds (2023)

Monge SAM: Robust Reparameterization-Invariant Sharpness-Aware Minimization Based on Loss Geometry (2025)

Fundamental Convergence Analysis of Sharpness-Aware Minimization (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Riemannian Sharpness-Aware Minimization (RSAM).

Riemannian Sharpness-Aware Minimization (RSAM)

1. Objectives and Conceptual Framework

2. Geometric Foundations

3. Algorithmic Procedures

4. Reparameterization-Invariant and Loss-Induced (Monge) Metrics

5. Theoretical Guarantees

6. Practical Implementations and Empirical Evaluation

7. Connections, Limitations, and Ongoing Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Riemannian Sharpness-Aware Minimization (RSAM)

1. Objectives and Conceptual Framework

2. Geometric Foundations

3. Algorithmic Procedures

4. Reparameterization-Invariant and Loss-Induced (Monge) Metrics

5. Theoretical Guarantees

6. Practical Implementations and Empirical Evaluation

7. Connections, Limitations, and Ongoing Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research