Nonlinear Approximation Characteristics
- Nonlinear approximation is a method for approximating functions using adaptive, data-driven selections from flexible dictionaries that yield superior accuracy compared to linear methods.
- It leverages compositions and deep architectures, such as ReLU networks, to achieve exponential or rate-doubled convergence under certain smoothness assumptions.
- Applications include efficient high-dimensional PDE solvers, stable data assimilation, and compressive sensing where stability and optimal error decay are crucial.
Nonlinear approximation characteristics encompass the theory, methodologies, and performance metrics associated with approximating functions or operators via mappings that are nonlinear in their parameters, selection, or construction. In contrast to classical (linear) approximation, which utilizes fixed subspaces or bases, nonlinear approximation exploits the flexibility of adaptive or data-driven choices, compositions, and other nonlinear mechanisms to achieve superior accuracy for a wider range of target functions and under broader model assumptions.
1. Definitions and Metrics in Nonlinear Approximation
A central framework is the best -term nonlinear approximation, where, for a function and a "dictionary" of admissible functions, the goal is to approximate as a linear combination of elements optimally chosen from :
The metric of interest is how decays as increases, reflecting the approximation efficiency. Typical choices of include bases or frames (e.g., wavelets, splines, kernels, neural network parameterizations), and the "nonlinear" aspect refers to optimizing the selection of these terms for each target , rather than committing to a fixed collection (Shen et al., 2019).
For multivariate and tensor-product settings, similar constructs arise (e.g., best -term tensor product approximations (Bazarkhanov et al., 2014)).
2. Fundamental Theorem Classes and Rate Results
2.1 Classical and Kernels-Based Best -Term Rates
For many canonical function spaces (e.g., Sobolev, Triebel–Lizorkin, Besov), best -term approximation with suitably regular kernel families, wavelets, or splines satisfies (Hamm et al., 2016):
where of kernel terms, is the smoothness parameter, and the ambient dimension.
2.2 Composition and Deep Learning Regimes
When dictionary elements are compositions—especially, e.g., compositional neural networks with layers—depth can dramatically accelerate best -term rates:
- Depth-1 (shallow):
- Depth-2: , exponent doubles
- Depth-3 and higher: For Hölder on , is optimal; extra layers () give no further gain (Shen et al., 2019).
For one-dimensional with only continuity (not smoothness), doubling of rate at still occurs.
2.3 Superiority of Deep ReLU Networks
Certain function classes exhibit exponential Shannon-type rates for deep ReLU networks; Takagi-type and self-similar functions satisfy , while best spline or wavelet dictionaries deteriorate to polynomial rates (Daubechies et al., 2019).
2.4 Nonlinear Tensor Product Approximation
For in periodic mixed-smoothness class on variables:
$e_M(W^r_q)_p := \sup_{f \in W^r_q} \inf_{S_M} \|f - S_M\|_{L^p} \sim (M\ln M)^{-rd/(d-1)} \quad \text{(%%%%32%%%%)}$
Upper and lower bounds match up to logarithmic factors; constructive greedy algorithms attain similar exponents (Bazarkhanov et al., 2014).
2.5 Restricted and Weighted Approximation
The introduction of general measures for restricted -term approximation leads to a full characterization of approximation spaces using weighted Lorentz sequence spaces and the upper/lower Temlyakov property, unifying Jackson and Bernstein inequalities with approximation embeddings (Hernández et al., 2011).
3. Methodological Frameworks
3.1 Compositional Dictionaries
Composition of shallow networks or blocks leads to improved expressivity. The optimal rate-doubling ("L=2") and rate saturation ("L=3") phenomena are both tied to the combinatorial growth of the function landscape under composition and tiling: e.g., in dimensions, tiling requires cubes, each attaining local error for Hölder (Shen et al., 2019).
3.2 Kernel and Approximation Families
Regular families of decaying or growing kernels—encompassing Gaussians, multiquadrics, cardinal functions—can be systematically analyzed for their -term rates by verifying translation, dilation, and Poisson summation properties (hypotheses (A1)-(A6)) (Hamm et al., 2016). These kernels enable nonlinear spaces that match the performance of best wavelet expansions.
3.3 Greedy and Library-Based Schemes
For high-dimensional parametric PDEs and analytic function classes with anisotropy, adaptive library-based piecewise Taylor approximations subdivide the parameter space and select local low-dimensional spaces for each cell, achieving quantifiable error rates depending logarithmically or subexponentially on the error tolerance, breaking or mitigating the curse of dimensionality, depending on the anisotropy decay (Guignard et al., 2022).
3.4 Choquet and Nonlinear Integral Operators
Nonlinear extension of classical constructive schemes via the Choquet integral leads to Bernstein–Choquet and Picard–Choquet operators. These exhibit improved rates for certain function classes (monotone/concave or exponentials), outperforming classical linear positive operators in those regimes (Gal, 2014).
3.5 Algorithmic and Computational Aspects
Many nonlinear approximation methods, particularly those involving nonconvex selection or parameter search (e.g., kernel parameter grids, nonnegative least squares for rational/exponential approximations (Vabishchevich, 2023)), employ iterative or greedy algorithms. Effective discretization, active-set NNLS, and QR-based stabilization are standard techniques; provable convergence properties may be lacking in fully nonlinear parameter regimes.
4. Stability, Manifold Widths, and Optimality
Realistic nonlinear approximation must account for numerical stability, most prominently captured by the notion of stable manifold widths . These widths are intimately connected to the entropy numbers measuring the compactness of . Fundamental consequences (Cohen et al., 2020):
- For Hilbert spaces, stable widths and entropy numbers are equivalent up to constants.
- In Banach spaces, enforcing -Lipschitz continuity in encoder/decoder bounds the possible approximation rates by entropy—precluding "faster-than-entropy" rates.
- For unit Lipschitz-bounded function classes (e.g., Lip([0,1])), enforcing stability forces O() error decay, even as unstable approximations (deep ReLU nets with arbitrary parameterization) can obtain O().
5. Specialized and Emerging Regimes
5.1 Quadratic and Algebraic Manifolds
The quadratic formula–based degree-2 nonlinear approximation constructs closed-form smooth coefficient manifolds to represent single-variable functions as roots of degree-2 polynomials with a learned index function for branch selection. This yields global exponential convergence across discontinuities (unlike linear/rational schemes), as the algebraic variety encodes jumps sharply and enables effective edge-preserving denoising (He et al., 6 Dec 2025).
5.2 Piecewise-Affine and Cut-Based Schemes
Multi-dimensional nonlinear functions can be efficiently approximated by iteratively partitioning the domain using hinging hyperplanes and fitting local affine surrogates (PWA). Adaptive cut-selection, continuity enforcement, and region complexity increase only as needed, attaining accuracy with many fewer regions compared to mesh-recursive baselines (Gharavi et al., 2024).
5.3 Recurrent and Sequence Models
Nonlinear RNN approximation is fundamentally limited by a Bernstein-type inverse theorem: stably approximable sequence-to-sequence maps must have exponentially decaying memory kernels, generalizing the "curse of memory" from linear to nonlinear architectures. Overcoming this requires Hurwitz-parmeterized recurrent matrices to stably represent slow memory decay (Wang et al., 2023).
6. Practical Implications and Applications
- For Hölder or Sobolev targets on , compositional deep networks with moderate width and depth achieve best-known nonlinear rates, and extra depth offers no further gain (Shen et al., 2019).
- In kernel regimes, nonlinear -term kernel spaces achieve wavelet-optimal rates, and cardinal interpolation yields powerful greedy truncated or adaptive sampling-based approximations.
- Library-based partitioning enables scalable surrogates for high-dimensional parametric models (PDEs, uncertainty quantification), with complexity scaling dictated by analytic anisotropy parameters (Guignard et al., 2022).
- For models requiring stability (data assimilation, numerical PDEs, compressed sensing), achievable rates must be benchmarked via entropy or stable manifold widths, not by the raw performance of unconstrained parametrizations (Cohen et al., 2020).
7. Open Problems and Outlook
Open questions include the development of optimal or near-minimal algorithms for coefficient construction in nonlinear/algebraic manifold representations, understanding the precise role of Lipschitz stability across architectures, effective index/function encoding in high-dimensional or multi-valued contexts, and rigorous convergence guarantees for adaptive piecewise or greedy parameter selection schemes. The theory continues to evolve with advances in neural and kernel architectures, high-dimensional surrogate modeling, and algorithmic stability under data and parameter perturbations.
References by arXiv id:
- Nonlinear Approximation via Compositions (Shen et al., 2019)
- Nonlinear Approximation and (Deep) ReLU Networks (Daubechies et al., 2019)
- Regular Families of Kernels for Nonlinear Approximation (Hamm et al., 2016)
- Nonlinear approximation of functions based on non-negative least squares solver (Vabishchevich, 2023)
- Nonlinear tensor product approximation of functions (Bazarkhanov et al., 2014)
- Nonlinear approximation of high-dimensional anisotropic analytic functions (Guignard et al., 2022)
- Optimal Stable Nonlinear Approximation (Cohen et al., 2020)
- Quadratic Formula-based Nonlinear Approximation (He et al., 6 Dec 2025)
- Approximation by Choquet Integral Operators (Gal, 2014)
- Iterative Cut-Based PWA Approximation (Gharavi et al., 2024)
- Inverse Approximation Theory for Nonlinear RNNs (Wang et al., 2023)
- Projection-Based Finite Elements for Nonlinear Function Spaces (Grohs et al., 2018)