Elementary Universal Activation Function
- EUAF is a class of elementary activation functions that guarantee universal approximation by meeting strict non-polynomial and continuity criteria.
- It leverages harmonic analysis, ridgelet transforms, and parametrized adaptivity to enable dense function approximation on both compact and non-compact domains.
- EUAF methodologies promote architectural parsimony and practical adaptability, impacting tasks from vision to dynamic system modeling.
An Elementary Universal Activation Function (EUAF) is a class of activation functions for neural networks that—despite their elementary structure—are able to confer the universal approximation property to networks in both theory and practice. The universality of EUAFs refers to the capacity for a neural network employing such a function (or a parametrized version thereof) to approximate any target function from broad classes (e.g., continuous, , Sobolev) on compact or even non-compact domains, arbitrarily well, given sufficient expressivity in terms of weights, biases, or composition depth. The paper and characterization of EUAFs spans harmonic analysis, approximation theory, dynamical systems, and practical machine learning, centering on their algebraic, analytic, and geometric properties that guarantee density in functional spaces and enable efficient signal propagation.
1. Mathematical Foundations and Universality Criteria
The foundational criterion for universality in activation functions derives from results such as the Cybenko theorem and its many generalizations (Sonoda et al., 2015, Neufeld et al., 18 Oct 2024, Shin et al., 10 Apr 2025). Specifically, an activation function is universal if:
- It is non-polynomial and continuous (classical scalar case).
- Its derivatives (possibly up to a certain order) are integrable and bounded, forming a neural network approximate identity (nAI) (Bui-Thanh, 2021).
- It satisfies certain admissibility conditions in a generalized harmonic analysis framework, such as reconstructivity in the ridgelet transform construction, i.e., for admissible , :
- For unbounded activations (e.g., ReLU), universality is maintained under Lizorkin distribution regularity, provided the admissibility condition above holds (Sonoda et al., 2015).
The EUAF framework encompasses both fixed and parametrized functions, including smooth, piecewise analytic, periodically oscillatory, and even certain non-smooth constructs—subject to these criteria.
2. Squashable and Superexpressive Activation Functions
The concept of "squashable" activation functions is pivotal; a function is squashable if, via composition with affine mappings, it can approximate both the identity and step functions on any compact set (Shin et al., 10 Apr 2025). Explicitly:
- Condition 1 (Identity): Continuous differentiability and nonzero derivative at some .
- Condition 2 (Step): Existence of a width-1 -network approximating the binary threshold outside any small region.
Table: Classes of Activation Functions Satisfying Squashability
Class | Criteria Satisfied | Example Functions |
---|---|---|
Non-affine analytic | Both identity and step function (via composition) | Sigmoid, tanh, sine, exp |
Piecewise, with nonzero derivatives | Step via kink, identity via local invertibility | Leaky-ReLU, h-swish |
Superexpressive families (e.g., ) provide a fixed-width architecture—independent of target function complexity—capable of universal approximation (Yarotsky, 2021, Wang et al., 12 Jul 2024). Periodic and inverse trigonometric functions facilitate dense coding via irrational winding arguments, in contrast to Pfaffian or polynomially piecewise activations, which lack sufficient oscillatory complexity for such properties.
3. Construction via Harmonic Analysis: Ridgelet, Radon, and Backprojection
For unbounded or Lizorkin activation functions, universality is established constructively through ridgelet transform theory (Sonoda et al., 2015):
- Ridgelet transform analyzes along hyperplane slices.
- Its inversion employs admissible pairs and reconstructs via the dual ridgelet transform:
- The backprojection filter (Radon inversion) is interpreted as what the network "learns" after backpropagation; Parseval's relation establishes energy conservation:
Activation function choice determines the functional behavior in both analysis (ridgelet) and synthesis (dual transform) steps—admissibility is then a precise regularity and frequency condition.
4. Parametric and Adaptive EUAFs
Modern practice introduces parametrized activation functions (e.g., Universal Activation Function (UAF), Parametric Elementary Universal Activation Function (PEUAF)) (Yuen et al., 2020, Wang et al., 12 Jul 2024):
- UAF:
- PEUAF (triangle-wave + analytic tail):
- These functions morph among known forms (identity, sigmoid, ReLU, Mish, etc.) as their parameters adapt during gradient-based training, enabling optimization and task-specific nonlinearity.
In practical tasks (CIFAR-10, gas quantification, reinforcement learning), networks using UAFs/PEUAFs evolve their parameters to match nearly optimal fixed activations or discover new ones, demonstrating empirical universality and adaptability.
5. Minimum Width and Architectural Parsimony
The theoretical bound for minimum width in networks using EUAFs is established as for input dimension , output dimension (Shin et al., 10 Apr 2025). For monotone activation functions and , is both necessary and sufficient.
Table: Minimum Width for Universal Approximation with EUAF
Monotone | Min. Width | ||
---|---|---|---|
1 | 1 | Yes | 2 |
≥2 | Any | Yes/No | |
Any | ≥2 | Yes/No |
This establishes the parsimonious architectural regimes in which EUAF-based networks maintain universality.
6. Extensions: Refinability, Structural Manipulations, and Universal Domains
Beyond standard universality:
- Refinable activation functions (e.g., spline-based) allow splitting neurons and inserting layers without changing network output, leveraging subdivision theory via two-scale or refinement equations (López-Ureña, 16 Oct 2024).
- EUAFs ensure universal approximation over non-compact domains (weighted or Sobolev spaces) as long as the activation is non-polynomial and meets certain growth and regularity constraints (Neufeld et al., 18 Oct 2024).
- In ODENet and ResNet architectures, the use of a single non-polynomial, Lipschitz continuous EUAF is sufficient for universal approximation of continuous dynamical mappings, with function class and discretization error controlled robustly (Kimura et al., 22 Oct 2024).
7. Practical Implications, Optimization, and Future Directions
The EUAF paradigm provides:
- A general framework for activation function search, including entropy-based optimization (EAFO), with explicit correction schemes to reduce information entropy and improve robustness (e.g., CRReLU) (Sun et al., 19 May 2024).
- Constructive guidance for architecture design, including explicit formulas for neuron number, scaling, weights, and non-asymptotic error rates (Bui-Thanh, 2021).
- Adaptation to equivariant and domain-specific architectures (e.g., unitary equivariant networks) using generalized invariant scalar functionals fused with any standard EUAF (Ma, 17 Nov 2024).
- Opportunities for further exploration of superexpressiveness, parametric adaptivity, and composite or hybrid activations, as indicated by the empirical success and theoretical density results obtained for periodic, spline-based, and analytic forms.
References to Key Literature
- Harmonic analysis/ridgelet: (Sonoda et al., 2015)
- Squashability and minimum width: (Shin et al., 10 Apr 2025)
- Superexpressive periodic families: (Yarotsky, 2021)
- Parametric and adaptive universality: (Yuen et al., 2020, Wang et al., 12 Jul 2024)
- Approximate identity/unified constructive frameworks: (Bui-Thanh, 2021)
- Spline/refinable: (López-Ureña, 16 Oct 2024)
- Universality on non-compact domains: (Neufeld et al., 18 Oct 2024)
- ODENet/ResNet with single EUAF: (Kimura et al., 22 Oct 2024)
- Entropy-based activation optimization: (Sun et al., 19 May 2024)
- Unitary equivariant generalized activities: (Ma, 17 Nov 2024)
EUAF research demonstrates that elementary—yet fundamentally robust and mathematically sound—activation functions support the full expressive power of neural network architectures. This has yielded both deeper theoretical insight and improved practical adaptability, marking EUAFs as central objects in the paper and design of advanced neural systems.