Non-Euclidean Trust-Region Optimization
- Non-Euclidean trust-region optimization is a framework that generalizes classical methods by using alternative norms and metrics to define adaptive trust sets.
- It employs shape-changing norms and Riemannian geometries to address high-dimensional, non-smooth, and composite problems through both first- and second-order techniques.
- Practical applications in integer optimal control and deep learning demonstrate improved convergence, flexible step selection, and accelerated computational performance.
Non-Euclidean trust-region optimization generalizes classical trust-region (TR) methods by allowing the definition of the “trust” set via non-Euclidean norms, metrics, or proximities, and, in some cases, formulating the subproblems on spaces lacking vector space structure. This framework encompasses both second-order methods with adaptive geometry and first-order approaches relevant to large-scale, composite, or non-smooth problems. The non-Euclidean perspective interfaces directly with modern quasi-Newton methods, Riemannian optimization, as well as stochastic and momentum methods, as detailed in recent works (Manns, 16 Dec 2024, Brust et al., 2022, Mor et al., 2020, Kovalev, 16 Mar 2025). The overarching motivation is improved convergence behavior, problem-adapted step selection, and algorithmic flexibility, particularly for high-dimensional or geometrically intricate problems.
1. Formalism and Abstract Metric-Space Trust-Region Methods
Classical TR methods seek a step by minimizing a (typically quadratic) local model of the objective subject to , using the Euclidean norm. Non-Euclidean generalizations replace the Euclidean norm by a user-chosen norm, a shape-adaptive norm, or, more generally, a metric, or even formulate the method on a metric space , possibly lacking vector addition or scaling.
Metric-Space Trust Regions
Let be a compact metric space and an objective. At each iteration , a model with and trust-region radius are used. The subproblem is
where measures proximity and does not require vector-space operations. The acceptance criterion involves the ratio , analogous to classical TR methods, and is updated based on this ratio, typically via doubling or halving rather than resetting to an initial value (Manns, 16 Dec 2024).
Convergence in this setting relies on a "criticality" function , with characterizing first-order stationarity, and a suite of regularity/finiteness conditions on the model decrease, reduction ratios, and small-move properties. Under these, the method produces accumulation points that are stationary (i.e., ).
2. Non-Euclidean Trust-Region Subproblems
In non-Euclidean settings, the trust-region constraint is defined by a norm (not necessarily Euclidean), or a geometry adapting to the local curvature or problem structure.
Shape-Changing Norms and Curvature-Adaptive Geometry
At each iteration, define the model , where is an approximation to . The trust region is given by a "shape-changing" norm
where and are determined by the eigenspaces of . The resulting subproblem typically decouples into lower-dimensional or coordinate-wise constrained quadratic problems, often admitting closed-form solutions. This construction enables effective preconditioning and efficient solution even when is constructed via low-rank or memory-limited quasi-Newton updates, e.g., multipoint symmetric secant (MSS) methods (Brust et al., 2022).
Non-Euclidean First-Order Trust-Region Steps
A first-order non-Euclidean TR step solves
where is a finite-dimensional vector space, and is any norm (such as , spectral, or others). For deep learning, when is the matrix spectral norm, the solution is the orthogonalized gradient projector, and for other norms, corresponds to normalized gradient or signSGD-type updates (Kovalev, 16 Mar 2025).
3. Riemannian and Geometry-Adaptive Trust-Region Methods
Non-Euclidean TR methods on manifolds and with variable metrics extend the algorithm to structured domains with intrinsic geometric constraints.
Riemannian Trust-Region Framework
For the boundary trust-region subproblem (BTRS)
the feasible set is a Riemannian submanifold (ellipsoid or sphere) in . The Riemannian gradient and Hessian are computed using the chosen metric (e.g., ), and retraction steps ensure feasibility (Mor et al., 2020). Riemannian TR (RTR) methods solve subproblems in the tangent space subject to the manifold constraints, preserving curvature and geometric fidelity. Preconditioning can be incorporated via a variable metric , tuned to local Hessian properties, facilitating globally convergent, matrix-free methods.
Critical points correspond to affine eigenpairs, and the distinction between "easy" and "hard" cases for global optimality is made explicit in this framework.
4. Algorithmic Structures and Theoretical Guarantees
Non-Euclidean TR methods are characterized by their subproblem structure, step acceptance, radius update, and global convergence properties. The following table summarizes main algorithmic steps in several settings:
| Setting | Subproblem (TR) | Model/Norm | Optimality |
|---|---|---|---|
| Metric-space (abstract) (Manns, 16 Dec 2024) | s.t. | Model on | |
| Shape-changing TR (Brust et al., 2022) | s.t. | Adaptive norm | |
| Riemannian TR (Mor et al., 2020) | Quadratic in tangent space, | Riemannian metric | $\grad f=0$ |
| First-order TR (Kovalev, 16 Mar 2025) | s.t. | Arbitrary norm |
Convergence results follow under mild conditions:
- In abstract metric spaces, accumulation points are stationary provided the criticality gap and reduction conditions (A1)-(A4) hold.
- For shape-changing and Riemannian methods, under Lipschitz, boundedness, and subproblem accuracy assumptions, iterates converge to stationary points and possess complexity bounds matching Euclidean methods.
- First-order non-Euclidean TR methods with momentum are proven to achieve state-of-the-art convergence rates for smooth, non-convex, and star-convex problems (Kovalev, 16 Mar 2025).
5. Applications in Integer Optimal Control and Deep Learning
Non-Euclidean TR methods have been applied to problems where classical approaches are inadequate, including discrete-valued (integer) control, total variation (TV)-regularized estimation, and stochastic training of deep neural networks.
Integer Optimal Control with TV Regularization
When controls are restricted to discrete values and the penalty involves total variation, the admissible set has no vector-space structure. The trust-region subproblem is formulated using an -metric: where and is discrete (Manns, 16 Dec 2024). This setting requires the non-Euclidean, metric-space formulation. Convergence is shown via properties of the total variation jump sum and TV regularization, and computational benchmarks reveal that no-reset radius adaptation yields 40–80% faster algorithms with less than 10% degradation in final objectives.
Deep Learning Optimization via Non-Euclidean Trust-Regions
Matrix gradient orthogonalization can be interpreted as a spectral-norm trust-region step. The Muon optimizer applies a stochastic non-Euclidean TR method with momentum and spectral-norm constraint. Algorithmic and theoretical analyses reveal improved convergence rates, tighter variance bounds, and practical superiority over methods that orthogonalize at the gradient level but not in the momentum (such as Orthogonal-SGDM) (Kovalev, 16 Mar 2025).
The stochastic non-Euclidean TR framework also unifies normalized SGD (-norm), signSGD (-norm), and other variants by suitable norm selection. Convergence bounds are established for both deterministic and stochastic, smooth and star-convex settings.
6. Practical Implications, Computational Performance, and Guidelines
Empirical results across large-scale CUTEst test sets indicate that shape-changing norm TR methods, especially those with dense initialization in the MSS matrix, consistently outperform classical Euclidean-norm approaches, both in function evaluations and CPU time, typically closing 85% of problems within twice the optimal time (versus 60% for classical truncated-CG) (Brust et al., 2022). In integer optimal control with TV, the avoidance of radius resets leads to 40–80% runtime reduction at minimal loss in objective (Manns, 16 Dec 2024).
Best practice guidelines derived from applications include:
- Adapting the norm to the problem structure, e.g., using spectral norm for matrix layers or for quantized setups in deep learning.
- Employing momentum and step-size selection mechanisms that harmonize with the underlying geometry.
- Leveraging dense quasi-Newton initializations and memory selection (e.g., , subspace-initialization window ) for robust performance in shape-changing TRs (Brust et al., 2022).
- Decoupling weight decay from the main update, as in AdamW, integrating regularization directly into the trust-region projected step (Kovalev, 16 Mar 2025).
- Employing variable-metric preconditioning in Riemannian TR settings to moderate the Hessian condition number and accelerate local convergence (Mor et al., 2020).
7. Connections, Extensions, and Outlook
Non-Euclidean trust-region optimization subsumes a spectrum of modern algorithmic paradigms by allowing problem-adapted local geometry, non-vector-space admissible sets, and non-Euclidean constraints. This enables seamless extension to Riemannian manifolds, convex-composite models, and combinatorial feasible sets.
A plausible implication is that further development of efficient solvers for non-Euclidean TR subproblems, especially in stochastic or high-dimensional regimes, will be central to advances in optimization for structured deep networks, large-scale discrete control, and machine learning problems with intricate regularization or geometry. Connection to affine eigenproblem perspectives via Riemannian TRs suggests fruitful avenues for theoretical unification and efficient solver design (Mor et al., 2020).
Open research questions include refined complexity bounds under typical loss landscapes in deep learning, optimal adaptive norm selection, and characterization of global convergence in highly nonconvex non-Euclidean settings. Non-Euclidean trust-region optimization thus remains a central and expanding theme in theoretical and applied optimization research.