α-ReLU: Power Activation in Neural Networks
- α-ReLU is a family of power activation functions defined as [max{0,x}]^α, with α controlling smoothness, homogeneity, and growth behavior.
- It underpins theoretical analysis and practical design in neural networks, benefiting function approximation, PDE solvers, and control barrier synthesis.
- Variants such as two-slope Leaky α-ReLU and sparsifying transformations optimize computational tractability while achieving minimax-optimal rates in regression tasks.
The α-ReLU, or power ReLU, refers to a parametric family of activation functions of the form , where is the exponent parameter. This class generalizes the standard ReLU () and underpins recent advances in the theoretical analysis and practical design of neural architectures for function approximation, PDE solvers, control barrier functions, and sparse regularized learning. Its properties—homogeneity, smoothness, and approximation-theoretic behavior—are sensitive to the value of , which enables fine-grained control over regularity and functional capacities.
1. Formal Definition and Mathematical Properties
The α-ReLU activation is defined as
Key mathematical properties include:
- Homogeneity: For any ,
- Smoothness: If with , , then 0 (i.e., 1-times differentiable, 2-th derivative 3-Hölder). For integer 4, 5 but not 6.
- Growth at Infinity: 7 as 8, growing sublinearly if 9.
- Special Cases:
- Standard ReLU: 0, 1.
- Higher-order: integer 2 yields piecewise polynomials.
For variants, such as the two-slope Leaky α-ReLU used in control settings (Samanipour et al., 16 Mar 2026), the function is piecewise linear: 3 with 4.
2. Approximation and Regularity in Shallow α-ReLU Networks
Shallow α-ReLU networks are central to the analysis of PDE solution operators and function approximation in Sobolev/Hölder and Barron-type norms. For the Dirichlet-Laplace (Poisson) problem on half-spaces, solution regularity and approximation rates depend sensitively on 5 (Vaishampayan et al., 2024, Li et al., 18 May 2026):
- Fractional 6: Network solutions realize fractional Hölder regularity (7); the associated Barron norm 8 is compatible with the solution's smoothness.
- Integer 9: One obtains 0 regularity, corresponding to one derivative less of Lipschitz continuity (but not 1).
- Approximation Guarantees: Given a function 2 on the boundary with controlled 3 norm, the solution 4 and its Monte-Carlo approximation 5 in the domain satisfy
6
under technical conditions on 7, and 8. For integer 9, logarithmic penalties in the Barron norm arise due to log-divergences at the boundary (Vaishampayan et al., 2024).
Approximation of general 0 in 1 balls or Sobolev spaces with shallow 2-ReLU networks yields rates that depend polynomially or log-polynomially on the network width 3, the exponent 4, and the spatial dimension 5 (Li et al., 18 May 2026).
3. Barron, Sobolev, and Spectral Characterizations
The choice of 6 in α-ReLU directly links the network's functional capacity to analytic regularity scales:
- Barron Norms: For 7, defined as the infimum of expected weighted coefficients over representation by α-ReLU ridge superpositions, this norm governs approximation error for PDE boundary data (Vaishampayan et al., 2024).
- Sobolev Embedding: The functional class 8 embeds into 9 or 0 depending on whether 1 or 2, where 3 and 4.
- Path-Norm Regularization: For finite-width networks
5
the 6 path-norm is
7
Minimax-optimal generalization rates are achieved for regression over Barron and Sobolev (fractional) targets, with exponents determined by 8 (Li et al., 18 May 2026).
The critical regularity transition occurs when 9 crosses an integer: fractional powers yield 0 for 1, while integer exponents only ensure Lipschitz continuity of the 2-th derivative.
4. α-ReLU in Control Barrier Function Synthesis
For control systems with safe set invariance under polytopic input constraints, α-ReLU functions are used as surrogates for extended class-3 barrier functions (Samanipour et al., 16 Mar 2026):
- Two-Slope α-ReLU: Parameterized by positive slopes 4 on 5 and 6 respectively, ensuring continuity, piecewise differentiability, radial unboundedness, and strict monotonicity.
- Convexity in Synthesis: The two-slope α-ReLU maintains the linearity of control barrier function (CBF) constraints in linear programming synthesis, facilitating tractable certification of safety properties.
- Conservatism and UIS Construction: The union of invariant sets (UIS), obtained by max-composing solutions for different slopes, never reduces the certified safe set below that of the optimal linear α; in most cases, it expands it (Samanipour et al., 16 Mar 2026).
This surrogate captures the strength of general class-7 nonlinearities without introducing additional nonconvexity or substantial conservatism in stability certification.
5. α-ReLU Variants and Modified Network Architectures
Beyond pointwise power functions, the literature includes sparsifying α–ReLU transforms acting on weights, notably in nonparametric regression (Beknazaryan et al., 2022):
- Sparsifying α: Defined as
8
applied entrywise to network weight matrices prior to multiplication and activation, thereby imposing structured sparsity.
- Statistical Rates: With 9 or 0-penalized empirical risk minimization, sparsified α-ReLU networks achieve, up to log factors, minimax-optimal 1 prediction rates for 2-Hölder regression under sub-Gaussian noise (Beknazaryan et al., 2022).
This approach yields scale-invariance of penalty complexity, bypassing the suboptimal covering behavior of conventional penalized ReLU networks.
6. Practical Considerations and Trade-offs
Selection of 3 in ReLU4 activations is a design choice balancing analytic regularity, approximation power, and computational tractability:
- Regularity Requirements: Applications like PINNs or strong-form PDE solvers may necessitate 5 regularity, thus motivating integer 6.
- Computational Cost: For non-integer and especially irrational 7, the evaluation cost of 8 can be significant.
- Barron-Norm Growth: For integer values, an unavoidable logarithmic penalty emerges in the Barron norm near boundaries, impacting the efficiency of representation (Vaishampayan et al., 2024).
- Parameter Interpretability: In two-slope Leaky α-ReLU barrier constructions, tuning 9 modulates the aggressiveness of barrier enforcement for positive and negative violations (Samanipour et al., 16 Mar 2026).
A plausible implication is that the α-ReLU family enables granular matching of network expressivity to analytic and application-driven demands, but judicious tuning is required to balance all competing considerations.
7. Comparative Summary
| Variant | Mathematical Formulation | Main Use/Result |
|---|---|---|
| Standard Power α-ReLU | 0 | PDE solvers, approximation with controlled regularity (Vaishampayan et al., 2024, Li et al., 18 May 2026) |
| Two-slope Leaky α-ReLU | Piecewise linear with 1 | Barrier certification under control saturation (Samanipour et al., 16 Mar 2026) |
| Sparsifying α-ReLU | Piecewise constant-linear on weights | Sparse nonparametric regression at minimax rates (Beknazaryan et al., 2022) |
References
- Vaishampayan and Wojtowytsch, "Solving the Poisson Equation with Dirichlet data by shallow ReLU2-networks" (Vaishampayan et al., 2024).
- Beknazaryan and Sang, "Nonparametric regression with modified ReLU networks" (Beknazaryan et al., 2022).
- Li, Liu, and Shi, "Shallow ReLU3 Networks in 4-Type and Sobolev Spaces: Approximation and Path-Norm Controlled Generalization" (Li et al., 18 May 2026).
- ReLU Barrier Functions (multiple authors), "ReLU Barrier Functions for Nonlinear Systems with Constrained Control: A Union of Invariant Sets Approach" (Samanipour et al., 16 Mar 2026).