Rational Neural Networks (RationalNets)

Updated 21 September 2025

Rational Neural Networks are architectures that use rational functions (quotients of polynomials) to replace traditional activations, offering superior expressivity and approximation efficiency.
They achieve exponential efficiency in approximating non-smooth and oscillatory functions compared to ReLU and polynomial networks, as demonstrated in tasks like PDE solving and graph learning.
Architectural variants such as RatioNet and recurrent rational units support robust control, symbolic regression, and interpretable feature extraction across diverse scientific and engineering applications.

Rational Neural Networks (RationalNets) are neural architectures in which the nonlinear components, most prominently the activation functions or the learned basis in feature-extraction layers, are constructed explicitly as rational functions—quotients of polynomials. This formulation enables RationalNets to achieve strong expressive power with high approximation efficiency, particularly in regimes requiring the modeling of non-smooth features, singularities, or complex analytic relationships. Recent work has illuminated both the theoretical underpinnings and the rich diversity of applications for RationalNets, spanning universal approximation results, spectral graph learning, efficient ODE solvers, robust control systems, and algebraic-geometric characterizations.

1. Approximation Power and Efficiency

A central theoretical advance is the quantification of approximation rates between rational neural networks and classical ReLU models (Telgarsky, 2017, Boullé et al., 2020, Morina et al., 27 Aug 2025). For any ReLU network, there exists a rational function—constructed, for example, via Newman polynomials—of degree $O((\ln(1/\epsilon))^2)$ that is $\epsilon$ -close in supremum norm; conversely, any rational function can be mimicked by a ReLU network whose size scales as $O(\mathrm{polylog}(1/\epsilon))$ . Uniform approximation of certain non-smooth functions (e.g., the ReLU or absolute value) is exponentially more efficient for rational functions than for polynomials, since polynomials require degree $\Omega(\mathrm{poly}(1/\epsilon))$ for similar accuracy.

Recent results extend these findings to higher-order smoothness (Morina et al., 27 Aug 2025), showing that approximating functions and their derivatives in $\mathcal{C}^1$ norm can be achieved by rational neural networks with error $O(N^{-(\beta-1)})$ for $f\in\mathcal{H}^{\beta}_p([0,1]^d), \beta > 2$ , using networks of width $N$ and constant depth. In contrast, polynomial approximants are limited to $O(N^{-1})$ rates for similar tasks.

Rational activations of low but fixed degree, such as (3,2)—a cubic numerator over a quadratic denominator—are particularly favored due to their compositional properties: composing $k$ such activations yields rational functions of degree $3^k$ but with only $O(k)$ parameters (Boullé et al., 2020). This aligns with the observed exponential reduction in depth and parameters necessary for targeted accuracy compared to ReLU or polynomial networks.

2. Architectural Variants and Theoretical Models

The architectural flexibility afforded by rational parameterizations has led to several innovative frameworks:

Classical feedforward architectures: Rational activation functions are learned as $\frac{P(x)}{Q(x)}$ with trainable coefficients; these can be initialized either (i) to best Chebyshev rational approximations of desired functions (e.g., ReLU), or (ii) randomly and optimized via gradient descent (Boullé et al., 2020, Peiris, 2021).
Ratio Net and related rational approximators: As an alternative to the standard sequence of affine maps and activations, the Ratio Net eschews explicit nonlinearities in favor of directly modeling the output as a rational mapping of the input. This generalizes the Padé approximant principle to high-dimensional, multi-output settings (Zhou et al., 2020, Qin et al., 2021).
Spectral Graph Filters: In graph convolutional and spectral GNN architectures, rational function filters of graph Laplacians are used to capture sharp transitions and to address deficiencies of Chebyshev or polynomial filters. RationalNet and ERGNN replace or augment polynomial filters with explicitly optimized numerator and denominator rational components (Chen et al., 2018, Li et al., 26 Dec 2024).
Recurrent and Residual Rational Units: Adaptive rational activations—optionally shared ("recurrent") across layers—achieve both plasticity and strong function representation for reinforcement learning, and can absorb the effect of residual connections, leading to inherent regularization and gradient flow benefits (Delfosse et al., 2021).
Algebraic-Geometric Interpretations: By treating the ensemble of all network outputs (for fixed architectures) as an algebraic variety (the “neuromanifold”), it becomes possible to analyze expressivity, parameter identifiability, and function membership using explicit tools from algebraic geometry (Grosdos et al., 14 Sep 2025).

3. Comparative Studies and Expressivity

Direct comparison of rational, polynomial, and ReLU-based neural networks highlights the exponential efficiency gains in rational approximators for certain function classes. In tasks involving nonsmooth (e.g., $\mathrm{sign}(x)$ , $|x|$ , $\mathrm{max}$ -pooling) or oscillatory functions, rational function-based neural networks achieve an order-of-magnitude lower error with the same or fewer parameters compared to conventional neural or polynomial approximators (Peiris et al., 2023).

The expressive power of rational neural networks has nuanced boundaries, particularly in the context of graph neural networks (GNNs). Whereas rational feedforward networks enjoy universal approximation properties, rational GNNs cannot uniformly capture the full extent of queries expressible in the two-variable graded modal logic (GC2), unlike their ReLU GNN counterparts. Rational GNNs are limited to a strict sub-fragment (RGC2) that reflects the aggregation and degree constraints imposed by rational computation (Khalife, 2023). This divergence points to a complex landscape in which universal approximation in classical sense does not guarantee maximal logic expressivity in more structured domains.

4. Applications in Scientific Computing, Symbolic Regression, and Control

RationalNets have demonstrated superior performance in several high-impact domains:

Numerical PDEs and High-Order ODEs: Rational neural networks (and the Ratio Net) have been used as trial functions for physics-informed neural solvers, where their rational structure provides a larger convergence radius and more efficient traversal of the function space compared to MLP or polynomial networks (Qin et al., 2021, Shahane et al., 13 Sep 2024). Rational activations can more efficiently model sharp fronts and discontinuities in PDEs, as evidenced by Rational-WENO—a neural WENO framework whose rational layers significantly reduce dissipation and improve accuracy in fluid simulations.
Symbolic and Analytic Regression: Rational function-based architectures (e.g., RafNN, ParFam, EQL $^\div$ ) are uniquely suited for reconstructing closed-form physical models from data (Sun, 2021, Morina et al., 27 Aug 2025). For example, RafNN can rediscover Gassmann's equation for rock physics solely from observations, leveraging a sparse, principled structure in both numerator and denominator polynomials. These architectures are widely used in symbolic regression for physical law discovery, with rational components ensuring expressive parsimoniousness and $\mathcal{C}^1$ -level accuracy.
Robust Control Systems: The convexity of certain rational neural network architectures in their parameters has enabled the formulation of controller synthesis as a single SOS (sum-of-squares) feasibility problem. Novel rational activation functions such as $\mathrm{Rtanh}(x) = \frac{4x}{x^2 + 4}$ admit exact polynomial characterizations and facilitate less conservative robustness certificates in feedback loops for nonlinear plants (Newton et al., 2023).
Wavelet Feature Extraction: Rational Gaussian wavelets, with tunable zeros and poles, can be adapted in a neural layer to extract interpretable and highly sparse time-frequency features (e.g., for biomedical signal classification). A differentiable variable projection operator optimizes the placement of wavelet coefficients for maximal discrimination (Ámon et al., 3 Feb 2025).

5. Theoretical Capacity, Complexity, and Algebraic Geometry

Recent work has connected the parameterization complexity of RationalNets to deep invariants in algebraic geometry and learning theory:

Erzeugungsgrad and VC-Dimension: The Erzeugungsgrad, a measure of the generation degree of a Boolean algebra of constructible sets, directly relates to VC-dimension when evaluating classifier families derived from rational neural networks. The VC-dimension is bounded by the Krull dimension of the network's parameter space, up to logarithmic corrections (Pardo et al., 15 Apr 2025). Algebraic intersection theory and degree estimates (e.g., via refined Bezout-type inequalities) are thus key to quantifying the shattering capacity and sample complexity of rational function classifiers.
Algebraic Structure of Network Expressivity: The “neuromanifold” abstraction encodes the set of all functions accessible to a given RationalNet as an algebraic variety. For networks with a single hidden layer and a rational activation such as $1/x$, one can provide factorization-based defining equations for the manifold and algorithms for deciding function membership or parameter recovery. For deep binary RationalNets, the Zariski closure of the neuromanifold equals the ambient space if and only if the output is one-dimensional, indicating when true universal approximation is achieved within the rational function class (Grosdos et al., 14 Sep 2025). These algebraic-geometric techniques facilitate identifiability analysis, loss landscape exploration, and model verification.
Depth Lower Bounds with Bounded Rational Weights: The expressivity of ReLU networks with rational (e.g., $N$ -ary fraction) weights is strongly limited in depth when aiming for exact representations of functions such as $F_n(x) = \max\{0, x_1, \dots, x_n\}$ . For decimal fractions, at least $\lceil\log_3(n+1)\rceil$ hidden layers are needed (Averkov et al., 10 Feb 2025). This result, based on normalized volume invariants of lattice polytopes and generalized “clearing denominators” arguments, highlights the tradeoff between width and depth in rational architectures, mirroring a broader separation in expressive capacity.

6. Limitations, Challenges, and Future Directions

Despite their strong theoretical and empirical properties, RationalNets have boundaries and areas requiring further investigation:

In structured domains such as GNNs, not all non-polynomial activations (e.g., rational) guarantee maximal expressivity; uniform expressibility may be strictly less than for ReLU activations (Khalife, 2023).
The management of singularities in the activation (e.g., from $1/x$ or other rational forms) warrants careful architectural and numerical design, especially in optimization.
Open problems remain regarding the generalization properties of networks with adaptive rational activations in environments with sparse or imbalanced data, and the precise alignment of algebraic invariants (such as Erzeugungsgrad and Krull dimension) with empirical learning curves.
Investigation into $\mathcal{C}^k$ ( $k\geq 2$ ) approximation properties and analysis of rational approximations in deeper, more heterogeneous architectures (including convolutional and attention-based models) are ongoing research avenues.
In high-dimensional settings, stability and conditioning issues in rational approximation algorithms (e.g., AAA or differential correction methods) become pronounced, necessitating new numerical schemes and regularization techniques (Peiris et al., 2023).
The implementation of rational layers in large-scale frameworks requires efficient parameterization (e.g., low-degree parameter sharing or recurrent rational units) and robust backpropagation under potentially complex rational derivatives.

7. Concluding Perspective

Rational Neural Networks represent an intersection of approximation theory, algebraic geometry, and practical machine learning. By leveraging the exponential expressivity and compactness of rational functions, they address efficiency and accuracy limitations inherent in conventional neural architectures, while remaining amenable to theoretical analysis and interpretability. Whether as function approximators in scientific regression, feature extractors in signal processing, or robust controllers in closed-loop systems, RationalNets offer a principled and versatile approach that is increasingly substantiated by both deep mathematical structure and empirical performance. The continued development of their theory, optimization, and application is poised to further expand the horizon of function-approximation methods in machine learning and computational mathematics.