Approximate Stationarity in Optimization

Updated 11 May 2026

Approximate stationarity is a framework that relaxes exact stationarity by certifying near-criticality through perturbation-based conditions across variational, nonsmooth, and stochastic settings.
It facilitates optimization with minimal constraint qualifications, employing variants like approximate KKT, M-stationarity, and directional methods to ensure practical solution viability.
Its methodologies underpin algorithmic certification and finite-time procedures in large-scale, nonconvex, and combinatorial problems, with applications in time series modeling and variational analysis.

Approximate stationarity is a broad framework that establishes quantitative, perturbation-based optimality conditions in variational analysis, nonsmooth/nonconvex optimization, time series modeling, and stochastic process theory. It systematically relaxes exact stationarity or stationarity-type necessary optimality conditions—often in contexts where classical constraint qualifications, regularity, or explicit smoothness do not hold—by certifying that near-criticality is achieved along sequences or under local perturbations. The concept arises in several guises, including approximate Karush–Kuhn–Tucker (KKT) or Mordukhovich (M-) stationarity in constrained optimization, approximate and near-approximate stationarity for nonsmooth functions, a-stationarity and extremality for set-valued frameworks, and quantitative measures of (approximate) stationarity for stochastic processes and time series. Modern research emphasizes both the necessity of such conditions, their attainability under minimal or no constraint qualification, the possibility of extracting exact stationarity under verifiable subset-type conditions, and the crucial role of approximate stationarity in algorithmic certification for large-scale and combinatorial problems.

1. Mathematical Foundations of Approximate Stationarity

Approximate stationarity in deterministic variational and optimization contexts is typically formulated by requiring the existence of sequences of approximately stationary points whose variational residuals vanish. In the context of a constrained optimization problem,

$\min_x f(x) \quad \text{subject to} \quad x \in X \subseteq \mathbb{R}^n$

approximate stationarity means that for every $\varepsilon > 0$ , a point $x^k \to \bar{x}$ may be found such that

$\text{stationarity-residual}(x^k) \leq \varepsilon \,.$

Key variants:

Classical Approximate M-stationarity (for problems with geometric or set-valued constraints): given $f: \mathbb{R}^n \to \mathbb{R}$ and $F: \mathbb{R}^n \to \mathbb{R}^m$ , for a feasible $\bar{x}$ , there exist $x^k \to \bar{x}, \; \delta^k \to 0, \; \varepsilon^k \to 0, \; \lambda^k \in N_\Gamma(F(x^k) - \delta^k)$ , such that

$\varepsilon^k - \nabla f(x^k) \in \partial \langle \lambda^k, F(\cdot) \rangle (x^k) \,,$

where $N_\Gamma$ is a suitable limiting normal cone (Käming et al., 8 May 2026, Käming et al., 28 Mar 2025, Kruger et al., 2021).

Approximate Stationarity for Nonconvex/Lipschitz Functions: For $\varepsilon > 0$ 0 locally Lipschitz, the notion is that $\varepsilon > 0$ 1 is $\varepsilon > 0$ 2-approximately stationary (w.r.t., e.g., the Clarke subdifferential) if

$\varepsilon > 0$ 3

and near-approximate stationarity (NAS) if there exists $\varepsilon > 0$ 4 with $\varepsilon > 0$ 5 such that $\varepsilon > 0$ 6 (Tian et al., 6 Jan 2025, Tian et al., 2021).

Primal–Dual Formulation in Set Collection (Extremality): A finite family $\varepsilon > 0$ 7 of closed sets is approximately stationary at $\varepsilon > 0$ 8 if, for every $\varepsilon > 0$ 9, there exist points $x^k \to \bar{x}$ 0, shifts $x^k \to \bar{x}$ 1, and $x^k \to \bar{x}$ 2 such that

$x^k \to \bar{x}$ 3

(Bui et al., 2018). Quantitative dual characterizations (extended extremal principle) are then formulated in terms of generalized normal cones.

Directional and Higher-Order Notions: Approximate directional stationarity requires the residual stationarity conditions to hold along sequences converging in prescribed critical directions, and further to higher orders in the context of coderivative approaches or mixed-order conditions (Käming et al., 8 May 2026, Benko et al., 2022).

Approximate stationarity is necessary for local optimality of broad classes of deterministic and stochastic programs without demanding exact constraint qualifications, and so forms the necessary (sometimes sufficient) floor for optimality analysis in nonsmooth, set-constrained, or composite optimization.

2. Approximate Stationarity in Optimization: Constraint Systems and Qualification Conditions

In modern constrained optimization, especially with disjunctive, complementarity, or geometric constraints, classical optimality conditions (strong/M-stationarity/KKT) require constraint qualifications (CQ) such as MFCQ or generalized Guignard constraints. Approximate stationarity is guaranteed for local minimizers without any CQ; extracting "true" stationarity from asymptotic certificates necessitates mild subset-type conditions:

Concept	Sequence Condition	Extracts Exact Stationarity if...
Approximate stationarity	$x^k \to \bar{x}$ 4	AM-reg CQ or subset-MFCQ (finitely verifiable)
Directional/Mixed-order	Sequence converges to direction $x^k \to \bar{x}$ 5 with residual constraints	directional AM-reg CQ or subMFC(d) (single sequence)

Subset-MFCQ and SubMFC Qualifications: For problems of disjunctive or orthodisjunctive form ( $x^k \to \bar{x}$ 6), subset-MFCQ reduces to checking positive linear independence of the limiting gradients for one approximate stationarity sequence, and thus is verifiable post hoc (Käming et al., 8 May 2026, Käming et al., 28 Mar 2025).
Asymptotic Regularity/AM-regularity: Limsup-type conditions on sequences ensure passage to exact stationarity, but are milder than full (e.g., classical) CQs (Käming et al., 28 Mar 2025, Kruger et al., 2021).

These subset-type CQs are crucial in structured and large-scale nonsmooth optimization where full constraint qualifications may be stringent or intractable.

3. Approximate Stationarity in Nonsmooth and Nonconvex NLO: Subdifferential Calculus, Algorithms, and Complexity

Approximate stationarity generalizes to nonsmooth and nonconvex problems using generalized subdifferentials (Clarke, Fréchet/regular, limiting/Mordukhovich):

Subdifferential Notions:
- Clarke: $x^k \to \bar{x}$ 7
- Fréchet: $x^k \to \bar{x}$ 8
- Limiting: outer limit of regular subdifferentials.
Approximate Stationarity Definitions:
- $x^k \to \bar{x}$ 9-stationarity: $\text{stationarity-residual}(x^k) \leq \varepsilon \,.$ 0.
- $\text{stationarity-residual}(x^k) \leq \varepsilon \,.$ 1-NAS: $\text{stationarity-residual}(x^k) \leq \varepsilon \,.$ 2 s.t. $\text{stationarity-residual}(x^k) \leq \varepsilon \,.$ 3.

Complexity results:

Testing $\text{stationarity-residual}(x^k) \leq \varepsilon \,.$ 4-stationarity for PA functions is strongly NP-hard in general for Clarke, co-NP-hard for Fréchet (Tian et al., 6 Jan 2025, Tian et al., 2023).
Polynomial-time tractability can be restored for specific relaxations (sum rules under compatibility), with tight geometric conditions for exactness (polytopal compatibility/transversality) (Tian et al., 6 Jan 2025).

Algorithmic contributions include robust rounding-plus-test schemes that, under mild span-qualification conditions, provide certified NAS tests in polynomial time, as for two-layer ReLU networks (Tian et al., 2023), and finite-time $\text{stationarity-residual}(x^k) \leq \varepsilon \,.$ 5 algorithms for NAS in locally outer-Lipschitz continuous subdifferential structures (Tian et al., 2021).

4. Approximate Stationarity in Stochastic Processes and Time Series

Approximate stationarity has a well-developed theory in the context of stochastic processes, notably for describing processes that are "locally stationary" or "stationary in the limit," as well as providing quantitative metrics for the approximation of nonstationary data structures.

Locally Stationary Processes: Triangular array $\text{stationarity-residual}(x^k) \leq \varepsilon \,.$ 6 governed by time-inhomogeneous Markov kernels $\text{stationarity-residual}(x^k) \leq \varepsilon \,.$ 7 is locally stationary if

$\text{stationarity-residual}(x^k) \leq \varepsilon \,.$ 8

where $\text{stationarity-residual}(x^k) \leq \varepsilon \,.$ 9 is the $f: \mathbb{R}^n \to \mathbb{R}$ 0-point stationary law for the "frozen" kernel at $f: \mathbb{R}^n \to \mathbb{R}$ 1 (Truquet, 2016, Stelzer et al., 2021).

Functional Metrics for Stationarity: The $f: \mathbb{R}^n \to \mathbb{R}$ 2-distance between the local spectrum $f: \mathbb{R}^n \to \mathbb{R}$ 3 and its stationary approximation $f: \mathbb{R}^n \to \mathbb{R}$ 4,

$f: \mathbb{R}^n \to \mathbb{R}$ 5

provides a quantitative measure for deviation from stationarity. Estimators based on periodogram Riemann sums, with associated CLTs and bootstrap (FARI( $f: \mathbb{R}^n \to \mathbb{R}$ 6)) critical values, enable sharp hypothesis testing for approximate stationarity in long-memory (Sen et al., 2013, Preuß et al., 2013).

Model-Specific Approximation: In CKSVARs (structural VARs with censored/kinked regimes), the region of approximate stationarity is sharply delimited using relaxed/constrained spectral radius (RJSR, CJSR), computable with SDP/SOS relaxations (Duffy et al., 2023).

5. Generalizations: Beyond Lipschitzness, Composite/Geometric Constraints, and Extremality

Recent developments extend approximate stationarity to Banach spaces with lower semicontinuous but non-Lipschitz objectives:

General Ekeland Variational Principle Approach: Approximate necessary optimality conditions are established using the fuzzy Fréchet sum rule and a notion of relative lower semicontinuity, yielding robust dual inclusions even without convexity or classical CQ (Kruger et al., 2021).
Composite/Geometric/Control Applications: The framework yields limiting necessary conditions for composite + geometric constraint systems, and, under uniform qualification, upgrades approximate to exact M-stationarity. Penalization/multiplier methods (augmented Lagrangian) naturally produce approximate stationarity certificates; the theory extends to infinite-dimensional settings and non-Lipschitz regularizers (e.g., $f: \mathbb{R}^n \to \mathbb{R}$ 7 sparsity in optimal control).
Extremality and Set Collection: The extended extremal principle asserts equivalence between approximate (primal) stationarity for collections of sets and a generalized separation (dual) property involving sums of normals; this unifies and generalizes all known convex separation-type theorems (Bui et al., 2018).

6. Practical Implications: Computability, Algorithmic Certification, and Limitations

Practical strategies for certifying approximate stationarity fall under several classes:

Relaxed Stationarity Tests: Algorithms exploiting tractable subdifferential sum rules (when valid) and geometric subset conditionings (e.g., span-qualification in ReLU networks) can provide robust polynomial-time certification for NAS (Tian et al., 2023, Tian et al., 6 Jan 2025).
Finite-Time Algorithms: Under mild outer-Lipschitz continuity or similar, perturbation-based iteration finds GAS/NAS points in finite time with explicit sample complexity (Tian et al., 2021). For constrained nonconvex optimization, dynamic second-order Frank–Wolfe methods deliver approximate second-order stationarity when the constrained QP subroutines are tractable (Nouiehed et al., 2018).
Limitations and Hardness: General (even first-order) approximate stationarity testing is NP-hard (or co-NP-hard) outside restricted cases, and except under sequence-specific or subset-type regularity, approximate stationarity may not imply exact stationarity (Tian et al., 6 Jan 2025, Tian et al., 2023, Nouiehed et al., 2018). In general, approximate stationarity can be necessary but not sufficient, thus attention to qualification or regularity is essential.
Stochastic/Time Series: Hypothesis tests for approximate stationarity (via resampling, empirical processes, or $f: \mathbb{R}^n \to \mathbb{R}$ 8-distance) require careful attention to long-memory, local spectral characteristics, and sample-size-dependent bandwidths to achieve correct type I error and power levels (Sen et al., 2013, Preuß et al., 2013, Nason, 2016).

7. Future Directions and Open Problems

Broadening Structural Conditions: Relaxing the outer-Lipschitz (OLC) requirement for NAS to noncompact domains, or further weakening qualification assumptions in constraint systems, remains a significant direction (Tian et al., 2021, Käming et al., 8 May 2026).
High-Order/Dynamic Regimes: The interface of mixed-order (critical direction/ $f: \mathbb{R}^n \to \mathbb{R}$ 9-pseudo) and dynamic or stochastic contexts presents many open questions in characterizing when approximate stationarity upgrades to exactness or when local minimizers can be selected via limiting arguments (Benko et al., 2022).
Algorithmic Certification in Deep Learning: Developing robust, computationally feasible stopping criteria and certification tools for nonsmooth neural network landscapes is ongoing, as is the transferability of NAS-type results to practically relevant deep and composite models (Tian et al., 2023).
Unified Extremality Frameworks: Deepening the connection between primal approximate stationarity, dual generalized separation, and regularity/transversality notions in infinite and high-dimensional settings offers a unifying perspective for future theoretical analysis (Bui et al., 2018).

Approximate stationarity—across its many domains and mathematical avatars—serves as a flexible, powerful, and unifying mechanism for variational analysis, revealing the geometric and algorithmic infrastructure underlying modern optimization and stochastic modeling.