Riemannian Sharpness-Aware Minimization (RSAM)
- RSAM is a family of optimization techniques that generalizes SAM by integrating intrinsic Riemannian geometry into the loss minimization process.
- It utilizes differential geometry tools such as retraction, projection, and intrinsic metrics to guide adversarial perturbations on manifold-constrained spaces.
- RSAM achieves enhanced generalization and stability with minimal computational overhead, yielding flatter minima and improved performance on benchmarks.
Riemannian Sharpness-Aware Minimization (RSAM) encompasses a family of methodologies generalizing sharpness-aware optimization by incorporating notions of local or intrinsic geometry, particularly Riemannian metrics and manifolds, into the minimization of loss sharpness. These advances address the limitations of Euclidean, parameterization-dependent sharpness-aware methods, and provide frameworks suited for constrained or geometrically structured parameter spaces. Two main strands have emerged in recent literature: geometric reparameterization-invariant sharpness-aware minimization, and intrinsic manifold-constrained RSAM.
1. Objectives and Conceptual Framework
RSAM extends the conventional Sharpness-Aware Minimization (SAM)—which minimizes the worst-case loss within a neighborhood of parameters—by reformulating neighborhoods, distances, and gradients using Riemannian geometry instead of Euclidean space. This yields methods that optimize parameters constrained to Riemannian manifolds or adapt the adversarial perturbation direction using a Riemannian metric intrinsically tied to the loss geometry.
Given a manifold with an intrinsic (often problem-induced) dimension , and empirical risk , the generic RSAM objective is
with the Riemannian ball of radius centered at (Truong et al., 2023).
2. Geometric Foundations
RSAM requires tools from differential geometry:
- Riemannian Metric: Inner product defined on tangent space at each .
- Retraction: Map 0 approximating exponential map, with 1 and 2.
- Projection: For embedded submanifolds, the ambient Euclidean gradient 3 is projected to 4 to yield the Riemannian gradient 5 via an orthogonal projection (Truong et al., 2023).
This geometric machinery defines neighborhoods and gradient-based optimization steps. For statistical learning on quotient manifolds or the Stiefel manifold (e.g., orthogonality constraints on weight matrices), these operations are realized in closed form (e.g., QR retraction, symmetric projection).
3. Algorithmic Procedures
RSAM algorithms generally decompose each iteration into
- Inner maximization: ("teleportation" step) Find the worst-case perturbation within a Riemannian ball by first-order Taylor expansion, leading to choosing the maximal-ascent direction of the Riemannian gradient and mapping it back to the manifold via retraction:
6
followed by 7, where 8, 9 encodes local metric scaling.
- Outer update: ("sharpness-aware descent") Apply gradient descent to the perturbed point, mapped back via retraction:
0
In Riemannian parameterizations induced by the loss geometry (as in Monge SAM (Jacobsen et al., 12 Feb 2025)), the metric is 1, making both the perturbation norm and direction adapt to the local slope.
4. Reparameterization-Invariant and Loss-Induced (Monge) Metrics
A key RSAM approach is to define the Riemannian metric from the embedding of the parameter manifold into the loss surface, yielding reparameterization invariance. The pullback metric is
2
with the adversarial step
3
This approach, termed Monge SAM (M-SAM), interpolates smoothly between SAM and vanilla gradient descent (GD): as 4, 5, recovering SAM; as 6, the step vanishes, recovering GD (Jacobsen et al., 12 Feb 2025). This metric yields a closed-form, invariant adversarial step and markedly improves robustness to hyperparameters as well as saddle-point escape properties.
5. Theoretical Guarantees
Theoretical analyses of RSAM provide generalization bounds leveraging Riemannian neighborhoods and PAC-Bayes concentration, typically tightening the dependence on the parameter space's intrinsic dimension 7 rather than the ambient parameter count 8. Specifically, for RSAM with retraction 9 and ball radius 0: 1 with 2 reflecting geometric aspects of the retraction (e.g., 3 for Stiefel/QR) (Truong et al., 2023).
For Monge SAM, invariance under reparameterization holds formally: for every smooth chart 4, the pullback metric guarantees that the adversarial step and corresponding descent trajectory commute with coordinate changes (Jacobsen et al., 12 Feb 2025).
6. Practical Implementations and Empirical Evaluation
RSAM algorithms:
- Maintain the computational cost of SAM (two forward/backward passes), with modest additional overhead (typically 5), even when enforcing geometric constraints such as orthogonality (Truong et al., 2023, Jacobsen et al., 12 Feb 2025).
- Require essentially the same hyperparameters as Euclidean SAM: perturbation radius 6, learning rate 7.
- Induce significantly improved generalization and train–test stability, including flatter minima (as indicated by smaller Hessian eigenvalues relative to SAM), larger test performance gains (8–9 in supervised classification, 0–1 in contrastive learning on vision benchmarks), and greater robustness to hyperparameter misspecification.
Empirical protocols utilize benchmarks such as CIFAR-10, CIFAR-100, FGVCAircraft, and architectures including ResNet-34/50, constrained via Stiefel manifold projections.
| Model/Task | CE+SGD | CE+SAM | CE+RSAM | SupCon+SGD | SupCon+SAM | SupCon+RSAM |
|---|---|---|---|---|---|---|
| CIFAR-10 (ResNet-34) | – | – | – | – | – | – |
| CIFAR-100 | +1–3% | +1–3% | +1–3% | +1–5% | +1–5% | +1–5% |
Observed sharpness of RSAM-trained models, as measured by the maximal Hessian eigenvalue, is consistently reduced relative to SAM (e.g., 2 vs 3) (Truong et al., 2023).
7. Connections, Limitations, and Ongoing Directions
RSAM provides a unifying geometric perspective on sharpness-aware training, generalizing previous approaches in loss-induced (Monge) Riemannian metrics (Jacobsen et al., 12 Feb 2025), manifold-constrained learning (Truong et al., 2023), and the pursuit of reparameterization invariance.
Notably, the term "RSAM" has been used in Euclidean, randomized smoothing (random-SAM) contexts as well (Khanh et al., 2024); however, those variants do not employ Riemannian geometry but rather exploit stochastic perturbation in Euclidean space.
Open research avenues include extending RSAM to more complex quotient geometries, exploring spectral regularization (e.g., Rényi sharpness), and applying RSAM in domains with inherent geometric constraints (e.g., equivariant networks, structured parameterizations).
In summary, Riemannian Sharpness-Aware Minimization advances sharpness-aware optimization by adapting adversarial perturbations and robust descent steps to the intrinsic geometry of the parameter space or loss landscape, yielding improved generalization, stability, and invariance properties compared to classical approaches (Truong et al., 2023, Jacobsen et al., 12 Feb 2025).