Integral Transformed Neural Networks

Updated 26 December 2025

Integral Transformed Neural Networks are neural architectures that explicitly use integral operators to map between function spaces for feature extraction and solution construction.
They integrate classical operators like Abel, Fredholm, or Radon transforms into both continuous and discretized learning pipelines for robust forward and inverse problem-solving.
These architectures demonstrate improved robustness and approximation capabilities in operator regression, physics-informed learning, and constrained function representation with proven universal approximation properties.

An Integral Transformed Neural Network (ITNN) is a neural architecture in which key operations—feature extraction, operator mapping, representation, or solution construction—are explicitly formulated via integral transforms, either as part of the architectural design, the functional representation theorem, or the discretized learning pipeline. The term encompasses a spectrum of approaches: networks built around the inversion or application of classical integral operators (e.g., Abel, Fredholm, or Radon), neural realizations of generalized or data-driven integral kernels for operator learning, architectures employing integral-based activation or transfer layers, and “universal operator” networks parameterizing nonlinear maps between function spaces via neural approximations of integral kernels and their primitives. ITNNs have been developed for forward and inverse problems, operator regression, physics-informed learning, constrained function representation, and discretization-invariant learning, among others.

1. Mathematical Foundations of Integral Transformed Neural Networks

The mathematical framework underpinning ITNNs is the parametrization of mappings between (possibly infinite-dimensional) function spaces via integral operators: $(\mathcal{K}f)(x) = \int_{\Omega} K(x, y) f(y)\,dy$ for a kernel $K(x, y)$ encoding nonlocal dependencies. Various ITNN variants make this explicit:

In inversion problems, the forward operator $A[f](s) = \int_{0}^s K(s, t) f(t)\,dt$ is realized (or inverted) via a network architecture (Chouzenoux et al., 2021).
In operator learning, $F:U\to V$ is represented as a generalized integral transform, possibly with nonlinear $\alpha(y, u)$ and neural or adaptive $K(x, y; \theta)$ (Wang et al., 2023).
Continuous-width and basis expansions formalize layers via integral kernels: $z^{l+1}(s') = \int w^l(s', s) \sigma(z^l(s)) ds + b^l(s')$ (Zhang et al., 2023).
In the integral representation of the ReLU network, every $f\in C^{2}_c(\mathbb{R})$ can be written as $f(x)=\frac12\int |x-z|f''(z)dz$ —a Green's function integral mapping $f''$ via the kernel $|x-z|$ (Petrosyan et al., 2019).

Discretized versions leverage basis expansions (e.g., PCA, Fourier) yielding optimization in finite dimensions while maintaining operator-theoretic rigor (Wang et al., 2023). Extensions to functional Banach/Hölder spaces via generalized Gavurin integral operators have yielded proofs of universal approximation for operator-valued mappings (Zappala et al., 2024).

2. Architectural Realizations and Integral Operator Discretization

Various architectural blueprints instantiate integral transforms in practical ITNNs:

Unrolled iterative solvers: The forward–backward splitting for an inverse problem (e.g., Abel inversion) results in a multi-layer network where each layer performs a prox-gradient update with operations dictated by the integral operator and its adjoint ( $A$ , $A^*$ ), with explicit bias injection modeling the measurement data as $b_0 = A^* y$ (Chouzenoux et al., 2021).
Integral Activation Transforms (IAT): Layers are constructed where the nonlinearity is implemented not pointwise but as an integral with respect to parameterized basis functions (e.g., $u = I^\sigma_{\phi,\psi}(z) = \int \psi(s) \sigma(z^\top \phi(s)) ds$ ). This imparts both smoothness and continuous adaptability to the activation structure (Zhang et al., 2023).
Operator learning via basis transforms: GIT-Net parameterizes the forward operator with per-frequency "kernels" in the PCA or harmonic basis, with the core layer computing $Y_{:,k} = D_{:,:,k} X_{:,k}$ , $X\in\mathbb{R}^{C\times K}$ , and then projecting back to the spatial domain via learned mixing matrices (Wang et al., 2023).
Integral autoencoder blocks: In IAE-Net, encoders and decoders are themselves discretized integral transforms with trainable neural kernels $K_E$ , $K_D$ modeled as small MLPs, allowing for grid-independence and discretization-invariant learning (Ong et al., 2022).

Discretization strategies include spectral projections, analytical eigenbasis, Gaussian or custom quadrature (for singular or infinite domains), random quadrature shifts (for avoiding overfitting), and basis interpolation for non-rectangular or irregular geometries (Chouzenoux et al., 2021, Ong et al., 2022, Aghaei et al., 2024).

3. Learning Procedures and Theoretical Guarantees

Integral-based neural representations impose unique training methodologies and permit strong theoretical control:

Unfolded minimization: In inverse settings, the whole network corresponds to $m$ steps of a proximal splitting algorithm minimizing a Tikhonov-regularized objective, with layerwise-learned stepsizes ( $\gamma_n$ ), regularization weights ( $\tau_n$ , $\mu_n$ ), and nonlinear barrier enforcement for constraints. The data term (bias) is integrally present at every layer (Chouzenoux et al., 2021).
Function space constraints: In Fixed-Integral Neural Networks, the primitive $F_\theta$ is optimized so that its $n$ -th partial yields the target $f$ , imposing (via normalization) a prescribed integral value and positivity through activation and weight design, supported by closed-form antiderivative computations (Kortvelesy, 2023).
Operator norm and robustness: Layerwise constructions and spectral choices allow explicit calculation of operator norms and Lipschitz constants, leading to robustness guarantees against perturbations in both data and network biases. For unrolled solvers, the overall Lipschitz constant $L_S$ contracts exponentially with depth (post-training $L_S=O(10^{-2})$ ) (Chouzenoux et al., 2021).
Universal approximation: Recent results based on the Gavurin integral operator framework guarantee that ITNNs can approximate any Fréchet-smooth operator between Banach spaces to arbitrary accuracy on compacts, provided sufficient architectural capacity and suitable gating/partition-of-unity schemes (Zappala et al., 2024).

4. Key Applications: Forward/Inverse Problems and Operator Learning

ITNNs have been systematically applied to:

Solving integral equations and inverse problems: Unrolled architectures efficiently and robustly invert Abel, Laplace, or Fredholm-type integral operators, even in the presence of noise, outperforming Kalman filter and Fourier inversion under realistic noise (Chouzenoux et al., 2021).
Operator regression and PDE solution surrogates: GIT-Net and IAE-Net provide parsimonious and accurate learned representations for mappings between function spaces, exhibiting computational superiority and generalization on complex and irregular geometries relative to FNO/POD-DeepONet (Wang et al., 2023, Ong et al., 2022).
Physics-informed learning with integral constraints: PINNIES and related frameworks implement fast and accurate integral layers (via Gaussian quadrature or fast matrix-vector products) for both forward and inverse problems in multi-dimensional, fractional, or singular kernel settings, with rigorous error and efficiency benchmarks (Aghaei et al., 2024).
Analytic and constraint-prescribed function representation: FINN architectures develop exact or efficiently computable integral representations enabling physically or statistically meaningful constraints, such as enforcing normalization ( $\int f=1$ ) or positivity of densities (Kortvelesy, 2023).

5. Representational Insights and Functional-Analytic Perspectives

Integral transform characterizations offer rigorous understanding of neural network expressivity:

Continuum representations: Theorems for shallow ReLU networks show that any compactly supported $C^2$ function can be represented exactly as an integral transform over the unit sphere with the kernel $\sigma(a\cdot x + b)$ , with weight density determined by higher-order derivatives, giving minimal $L_1$ “path norm” representations and Barron-type generalization bounds (Petrosyan et al., 2019).
Hypersurface and Radon-geometric views: Each layer of a neural network can be conceptualized as integrating the input density along hypersurfaces defined by pre-activation level sets (generalized Radon transforms), connecting nonlinearity and pooling to geometric properties of these integral transforms (Kolouri et al., 2019). This provides a geometric interpretation for adversarial vulnerability as small shifts transporting samples across hypersurfaces.
Basis diagonalization and parsimonious structure: Adaptive basis constructions (e.g., in GIT-Net) can diagonalize or nearly-diagonalize underlying PDE operators, uncovering low-dimensional representations of complex operator structure (Wang et al., 2023).

6. Limitations, Extensions, and Future Directions

While ITNN architectures have demonstrated empirical and theoretical efficacy, several directions remain active:

Depth and computational trade-offs: Unrolled and integral layers provide finitely deep approximations of operator inversion/minimization; convergence to variational optima is only as good as the chosen depth or grid resolution (Chouzenoux et al., 2021).
Quadrature and basis design: High-degree singularities, non-rectangular domains, or high-dimensional integrals necessitate carefully chosen quadrature schemes (e.g., adaptive or spectral), and the hyperparameters governing basis choice (e.g., finite rank, smoothness, localization) can affect both trainability and expressiveness (Aghaei et al., 2024, Zhang et al., 2023).
Generalization and regularization: Universal approximation results do not guarantee generalization or sample complexity bounds; practical architectures require strong regularization (e.g., Sobolev norm penalties, data augmentation) to avoid overfitting and to control operator norm spikes (Zappala et al., 2024, Ong et al., 2022).

Extensions under investigation include hierarchical or low-rank integral operator parameterizations for scalability, joint learning of bases, application to high-dimensional or stochastic operator settings, and generalization of the integral-transform paradigm to self-attention (transformers), convolutional, or graph-based architectures (Zappala et al., 2024, Wang et al., 2023).

Principal References:

Inversion of Integral Models: a Neural Network Approach (Chouzenoux et al., 2021)
Neural network integral representations with the ReLU activation function (Petrosyan et al., 2019)
Fixed Integral Neural Networks (Kortvelesy, 2023)
Integral Transforms in a Physics-Informed (Quantum) Neural Network setting (Kumar et al., 2022)
IAE-Net: Integral Autoencoders for Discretization-Invariant Learning (Ong et al., 2022)
Universal Approximation of Operators with Transformers and Neural Integral Operators (Zappala et al., 2024)
PINNIES: An Efficient Physics-Informed Neural Network Framework to Integral Operator Problems (Aghaei et al., 2024)
GIT-Net: Generalized Integral Transform for Operator Learning (Wang et al., 2023)
Neural Networks, Hypersurfaces, and Radon Transforms (Kolouri et al., 2019)