Non-Euclidean Trust Region Framework
- Non-Euclidean Trust Region (NTR) frameworks are optimization methods that substitute the Euclidean norm with alternative metrics, divergences, or manifold structures to handle complex geometries.
- NTR employs geometry-aware model building and trust-region subproblems using metric balls, divergence constraints, or tangent space projections for effective step computation.
- NTR ensures global convergence and practical performance by adapting step acceptance criteria and updating the trust region based on geometry-specific predictive reductions.
The Non-Euclidean Trust Region (NTR) framework generalizes classical trust-region optimization to settings where the underlying space, geometry, or modeling assumptions preclude standard Euclidean structure. NTR approaches replace the canonical Euclidean norm with more general metrics, divergences, or even metric spaces, enabling principled optimization in highly abstract, nonsmooth, or structured domains. This includes compact metric spaces (lacking vector-space operations), spaces of probability distributions under divergence metrics, matrix spaces with operator or nuclear norms, and manifolds equipped with Riemannian or weighted inner products. The NTR formalism encapsulates model-building, step computation, criticality assessment, and global convergence, with algorithmic and theoretical developments spanning smooth, nonsmooth, and even combinatorial/integer settings.
1. Foundational Principles and Problem Formulation
NTR frameworks are unified by the use of a generalized local model and trust-region subproblem, embedded in a non-Euclidean space. The abstract optimization problem is
where may be a compact metric space with metric (Manns, 2024), a finite-dimensional vector space with arbitrary norm (Kovalev, 16 Mar 2025), a probability simplex with divergence metrics (Zhao et al., 2019), a Riemannian manifold (Mor et al., 2020), or a Hilbert space with weighted inner product (Maia et al., 13 Jan 2026).
Model construction: At iterate , a model function is built to approximate locally, using only structure that is meaningful for the given geometry. The model must align at the point () and encode locality via a geometry-induced ball or divergence.
Trust region: The generalized trust region defines permissible steps. Examples:
- Metric ball: (Manns, 2024)
- Norm ball: (Kovalev, 16 Mar 2025)
- Distributional ball (e.g., KL or Bregman divergence): $\E_{s\sim\rho_{\pi_k}}[D(\pi(\cdot|s) \|\pi_k(\cdot|s))] \leq \delta_k$ (Zhao et al., 2019)
- Riemannian geodesic or manifold: on sphere or submanifold with equality constraint (Mor et al., 2020)
- Weighted ball: with (Maia et al., 13 Jan 2026)
Step calculation: The subproblem is solved approximately or exactly, subject to the trust region, delivering predicted reduction $\pred_k$ and actual (objective) reduction $\ared_k$. The acceptance ratio $\rho_k = \ared_k/\pred_k$ guides step acceptance and radius adjustment.
2. Trust-Region Algorithms in Non-Euclidean Geometries
The formal NTR algorithm generalizes standard trust-region update rules to the chosen geometry, maintaining the core logic of model-based local search with adaptive radius control.
Canonical steps:
- Build model at
- Compute a step (or increment ) by (approximately) minimizing over the trust region
- Compute predicted and actual reductions; calculate
- Accept/reject the step and update the radius or equivalent
Distinctive adaptations:
- In metric spaces, no vector operations are needed; only the metric ball is required (Manns, 2024)
- For divergences/Bregman distances, subproblems are solved in the geometry of the divergence (Zhao et al., 2019)
- For matrix problems, spectral or nuclear norms yield explicit forms (e.g., orthogonalized updates for matrix variables) (Kovalev, 16 Mar 2025)
- On manifolds, the subproblem is posed in the tangent space with Riemannian metric, using projections and retractions (Mor et al., 2020)
- Weighted-proximal methods embed all terms (including proximals and Cauchy steps) in weighted norms (Maia et al., 13 Jan 2026)
General parameterization:
Key parameters—shrinkage/enlargement factors (), acceptance thresholds (), maximum radius, model inexactness, and gradient inexactness—are adapted analogously to the Euclidean case, but all calculations (steps, norms, reductions) are performed in the relevant geometry.
3. Stationarity Concepts and Theoretical Guarantees
NTR frameworks extend classical first-order stationarity to settings where gradients or subgradients may not exist or are ill-defined. The main tool is a criticality measure capturing first-order optimality in the chosen geometry:
- In metric settings: $C(x) = \limsup_{\Delta\to 0} \frac{\pred(x,\Delta)}{\Delta}$, so iff is stationary (Manns, 2024)
- In composite-normed spaces: with (Kovalev, 16 Mar 2025)
- In Riemannian settings: critical points are characterized by projected gradients (KKT system for affine-eigenpairs) (Mor et al., 2020)
- In weighted-proximal methods: subdifferential conditions in -norm (Maia et al., 13 Jan 2026)
Convergence analysis:
- Under compactness and regularity, accumulation points of the NTR sequence are stationary with respect to the chosen criticality measure; global convergence is obtained under mild assumptions (Manns, 2024, Kovalev, 16 Mar 2025, Zhao et al., 2019, Maia et al., 13 Jan 2026, Mor et al., 2020).
- Complexity bounds are derived in the relevant norm or metric, e.g., for stationary measure in -norm (Maia et al., 13 Jan 2026), steps for star-convex problems (Kovalev, 16 Mar 2025).
4. Specialized Geometric and Statistical Instances
The NTR framework subsumes a broad array of algorithms:
| Setting | Geometry / Metric | Key Algorithmic Instantiation |
|---|---|---|
| Metric spaces | General metric | Integer control, switching cost, TV-regularized (Manns, 2024) |
| Reinforcement learning | KL/Bregman divergence | Stochastic Policy Optimization (Zhao et al., 2019) |
| Matrix/Deep Learning | Spectral/Nuclear norm | Muon, normalized SGD, signSGD (Kovalev, 16 Mar 2025) |
| Weighted Hilbert/Euclidean spaces | Weighted norm | Proximal Trust-Region, inexact prox (Maia et al., 13 Jan 2026) |
| Manifolds/Spheres | Riemannian metric | Riemannian trust-region, preconditioned (Mor et al., 2020) |
Notable technical structures:
- Natural-gradient or preconditioned step computation (e.g., for divergence constraints (Zhao et al., 2019))
- Momentum and stochasticity explicit in algorithmic models (e.g., moving average surrogate for noise reduction (Kovalev, 16 Mar 2025))
- Inexact proximal operators formalized via Fréchet subdifferentials (Maia et al., 13 Jan 2026)
- Explicit orthogonalization and norm-based projections yielding analytic steps in certain spaces (Kovalev, 16 Mar 2025)
5. Computational and Practical Insights
NTR algorithms maintain, and often improve, the practical features which make classical trust-region methods desirable.
- Non-Euclidean radii scheduling—e.g., no-reset vs. reset after acceptance—substantially impacts runtime. Empirically, no-reset schemes can reduce runtime by 40–80% with marginal effect on final objective (Manns, 2024).
- Using criticality and model-reduction lower bounds tailored to the geometry avoids premature shrinking of the radius and guarantees global progress.
- Explicit geometry-aware updates (e.g., spectral/orthogonal steps in Muon) have been shown to outperform heuristic alternatives (e.g., Orthogonal-SGDM) by maintaining unbiasedness in the surrogate direction (Kovalev, 16 Mar 2025).
- For probability distributions, decoupling mean and variance updates with careful divergence constraints prevents premature collapse of the variance, which is essential for exploration in reinforcement learning (Zhao et al., 2019).
- Weighted and Riemannian metrics enable variable preconditioning and more precise adaptation to local curvature, improving asymptotic and practical rates (Mor et al., 2020, Maia et al., 13 Jan 2026).
6. Extensions, Implementation, and Guidelines
Implementation of NTR algorithms is guided by a few universal steps:
- Geometry selection: Specify the norm, metric, or divergence compatible with domain structure (e.g., operator norm for matrices, KL for distributions, weighted norm for PDEs).
- Analytic or numerical step solver: For many norms, analytic solutions exist (normalized, sign, orthogonal steps); nonsmooth or composite problems may require proximal solvers in the chosen geometry (Kovalev, 16 Mar 2025, Maia et al., 13 Jan 2026).
- Momentum and variance handling: Necessary under stochastic or noisy oracles; buffer averaging improves robustness (Kovalev, 16 Mar 2025).
- Parameter tuning: Step-size, radius, and momentum are adapted based on problem-specific curvature and noise characteristics.
- Regularizer handling: Convex penalties (e.g., weight decay) are embedded directly in the NTR subproblem, maintaining decoupling from step-size (Kovalev, 16 Mar 2025).
For high-dimensional or structured domains (e.g., deep neural networks, optimal control), NTR provides a flexible theoretical foundation for combining geometry-aware adaptivity, global convergence, and practical tractability.
For comprehensive algorithmic and theoretical details, see the foundational resources (Manns, 2024, Zhao et al., 2019, Kovalev, 16 Mar 2025, Maia et al., 13 Jan 2026, Mor et al., 2020).