Trajectory-Restricted Optimization Conditions and Geometry-Aware Linear Convergence

Published 18 Apr 2026 in math.OC, cs.LG, and math.ST | (2604.17067v1)

Abstract: Linear convergence of first-order methods is typically characterized by global optimization conditions whose constants reflect worst-case geometry of the ambient space. In high-dimensional or structured problems, these global constants can be arbitrarily conservative and fail to capture the geometry actually encountered by optimization trajectories. In this paper, we develop a trajectory-restricted framework for linear convergence based on localized geometric regularity. We introduce restricted variants of the Polyak--Łojasiewicz inequality, error bound, and quadratic growth conditions that are required to hold only on subsets of the domain. We show that classical convergence guarantees extend under these localized conditions, and in key cases, we develop new arguments that yield explicit relationships between the corresponding constants. The resulting rates are governed by geometric quantities associated with the regions traversed by the algorithm. For polyhedral composite problems, we prove that convergence is controlled by restricted Hoffman constants corresponding to the active polyhedral faces visited along the trajectory. Once the iterates enter a well-conditioned face, the effective condition number improves accordingly. Our work provides a geometric quantification for fast local convergence after active-set or manifold identification and more broadly suggests that linear convergence is fundamentally governed by the geometry of the subsets explored by the algorithm, rather than by worst-case global conditioning.

Abstract PDF Upgrade to Chat

Authors (3)

Summary

The paper introduces trajectory-restricted regularity conditions that replace conservative global bounds by quantifying local geometry through Hoffman constants and restricted smoothness.
It formalizes the equivalence of restricted Polyak–Łojasiewicz and error bound conditions, proving that the active subsets visited by iterates dictate fast, local linear convergence rates.
Empirical studies with LASSO and SVM demonstrate that identifying active sets significantly improves conditioning and accelerates convergence, guiding adaptive algorithm design.

Trajectory-Restricted Convergence and Geometry-Aware Conditioning in First-Order Optimization

Introduction and Motivation

Standard analyses of first-order algorithms in convex and composite optimization typically invoke global regularity conditions—such as the Polyak–Łojasiewicz (PL) inequality, error bounds, and quadratic growth—which guarantee linear convergence with rates determined by worst-case geometric constants evaluated over the full ambient space. Such global constants often scale unfavorably with the problem dimension, especially in high-dimensional settings or when structure (e.g., sparsity) is present, leading to conservative theoretical predictions that are at odds with the empirically observed fast convergence upon entering structured (e.g., sparse or low-rank) solution manifolds.

This paper develops a trajectory-restricted theory of optimization, systematically localizing regularity conditions and their associated geometric constants to subsets relevant to the optimization trajectory. The authors provide formal definitions of restricted PL, error bound, and quadratic growth conditions, encompassing both smooth and polyhedral composite problems, and establish their equivalence on restricted subsets. The main insight is that actual convergence rates are fundamentally governed by geometric quantities (Hoffman constants, restricted smoothness) associated with active faces or manifolds traversed by the algorithm, not by worst-case global bounds.

Formal Framework: Restricted Regularity and Convergence

The foundational definitions in the paper introduce “restricted” versions of the central optimization regularity conditions, requiring them to hold only on subsets $\mathcal{K}$ visited by the iterates. For a composite

$F(x) = f(x) + g(x),$

where $f$ is $L$ -smooth (possibly nonconvex), and $g$ is convex, possibly nonsmooth, the main restricted conditions are:

$\mathcal{K}$ -restricted PL: For all $x \in \mathcal{K}$ ,

$\frac{1}{2}\mathcal{D}_g(x, L) \geq \nu_{\mathcal{K}} (F(x) - F^*),$

where $\mathcal{D}_g$ is the generalized gradient size.

$\mathcal{K}$ -restricted Error Bound (EB): For all $F(x) = f(x) + g(x),$ 0,

$F(x) = f(x) + g(x),$ 1

where $F(x) = f(x) + g(x),$ 2 is the proximal gradient mapping.

Crucially, the paper proves that all standard linear convergence guarantees for proximal-gradient methods hold under these restricted conditions, with contraction rates dictated by the localized constants on the sets actually visited by the iterates. If the iterates enter and remain within a well-conditioned subset $F(x) = f(x) + g(x),$ 3, the remaining optimization enjoys the fast “local” rate

$F(x) = f(x) + g(x),$ 4

where $F(x) = f(x) + g(x),$ 5 is the restricted smoothness (largest local Lipschitz constant) over $F(x) = f(x) + g(x),$ 6. They also establish the equivalence (with explicit constants) between the restricted PL and EB properties under weak invariance assumptions, thereby unifying restricted regularity analysis.

Polyhedral Problems: Local Hoffman Constants and Active-Set Behavior

For polyhedral composite problems

$F(x) = f(x) + g(x),$ 7

with $F(x) = f(x) + g(x),$ 8 strongly convex and $F(x) = f(x) + g(x),$ 9 polyhedral (including indicator constraints), the global Hoffman constant $f$ 0 quantifies the translation of constraint violations to distance from feasibility. The paper’s core theoretical result is that after localization to the set $f$ 1 (e.g., active face or support identified along the trajectory), the restricted PL and EB constants are explicitly governed by the restricted Hoffman constant $f$ 2. This leads to tremendous dimensionality reduction in the effective conditioning once the active structure is identified. In high dimensions, this improvement is dramatic: the global Hoffman constant for the $f$ 3-LASSO problem, for example, scales as $f$ 4, while the active-set restricted value is independent of $f$ 5, scaling only with the intrinsic sparsity.

Figure 1: The Hoffman constant $f$ 6 for LASSO increases as $f$ 7 with ambient dimension; highly correlated designs (spiked models) are strictly worse than Gaussian/Rademacher ensembles.

This phenomenon is formalized in results specifying the two-phase convergence profile observed for ISTA and similar first-order methods: a transient regime governed by conservative, ambient-scale geometry precedes sharp acceleration upon entry into the active-set manifold (e.g., support identification). The local linear rate is then governed by the active submatrix’s spectral and geometric properties, not those of the full design matrix.

LASSO: Empirical and Theoretical Geometry of the Iterates

Using the LASSO as a case study, the paper provides an explicit characterization of the optimal set, active set structure, and strict complementarity margins, with direct calculation of restricted Hoffman and firm convexity constants. Empirically and by analytic bounds,

The global condition number scales as $f$ 8;
The active-set restricted condition number scales as $f$ 9 (for $L$ 0), i.e., a reduction by $L$ 1.

The trajectory of the algorithm is shown to be quickly contained in a specific cone or even the active subspace:

Figure 2: ISTA iterates for LASSO quickly enter structured cones around the optimal support; once the Jaccard similarity between current and optimal support reaches 1, the local manifold has been identified and convergence accelerates.

This observation is robust across matrix ensembles, and designs satisfying the Restricted Isometry Property (RIP) enter the restricted cone and support set more sharply, explaining their improved rates. The analysis further demonstrates that in normalized formulations (with loss scaled by sample size), the local condition number becomes dimension-free ( $L$ 2), aligned with both theoretical and empirical high-dimensional statistics results.

Extensions: SVMs, Non-Polyhedral, and General First-Order Methods

The restrict-to-trajectory paradigm extends to the dual SVM problem, where the effective asymptotic rate is dictated by the geometry (restricted Hoffman constant) of the support vector subspace, not the global matrix—explaining fast convergence in overparameterized or kernelized SVM even when the ambient Gram matrix is ill-conditioned. The same philosophy applies to other composite regularizers (group-lasso, nuclear norm) and first-order solvers (coordinate, block-coordinate, SGD), and the framework is naturally generalized to arbitrary $L$ 3 norms.

The authors argue that this geometric approach demystifies the success of first-order methods in high-dimensional overparameterized regimes, including deep learning, where empirical trajectories concentrate on low-dimensional, favorably conditioned manifolds.

Implications and Future Directions

The trajectory-restricted theory developed here has several impactful consequences:

Geometry-Aware Algorithm Design: Knowledge of active-set entry can allow the use of adaptive step sizes and preconditioning tuned to local rather than ambient geometry, offering substantial practical acceleration.
Beyond Worst-Case Complexity: Practically observed fast convergence rates are justified whenever the algorithm’s trajectory rapidly enters well-conditioned subsets, and worst-case lower bounds are only relevant if iterates are forced to visit ill-conditioned regions.
Extensions to Nonconvex and Deep Learning Regimes: The framework provides an analytic pathway for local geometric analysis in nonconvex optimization, illustrated by the observed low-dimensional structure discovered by neural networks during training.

Conclusion

This work formalizes and quantifies the geometry-driven mechanism underlying linear convergence of first-order methods. By localizing regularity to the subset of the domain actually explored by the iterates, it demonstrates that linear convergence rates are governed by the conditioning of the visited regions—not the global worst-case geometry. For polyhedral and structured composite problems, this is captured by restricted Hoffman constants and restricted smoothness, directly linking optimization dynamics, algorithmic rates, and high-dimensional geometric statistics. The implications extend beyond convexity and suggest that adaptive, geometry-aware methods will be central to efficient optimization in large-scale and overparameterized statistical and machine learning models.

Reference: "Trajectory-Restricted Optimization Conditions and Geometry-Aware Linear Convergence" (2604.17067).

Markdown Report Issue