Neural Network-Based Methods

Updated 20 April 2026

Neural network-based methods are computational frameworks that use artificial neural networks to approximate complex, nonlinear relationships for diverse applications.
They employ a variety of architectures—including CNNs, RNNs, and hybrid models—to solve tasks ranging from boundary value problems to image segmentation with precise loss minimization.
Advances in physics-informed losses, automatic differentiation, and hardware acceleration have enhanced their accuracy, transferability, and scalability across multiple disciplines.

Neural network-based methods are computational frameworks that utilize artificial neural networks (ANNs) as core algorithmic components for solving diverse problems across scientific computing, data analysis, pattern recognition, numerical modeling, and automation. These methods leverage the universal approximation properties of neural networks to model complex, nonlinear mappings from data, physical laws, or operational rules, and have seen widespread adoption due to advances in scalable optimization, automatic differentiation, and high-performance hardware. The term collectively encompasses a vast array of approaches including supervised and unsupervised learning, variational problem solvers, deep architectures for high-dimensional data, and hybrid schemes integrating classical mathematical models with neural representations.

1. Foundational Principles and Algorithmic Structure

The defining feature of neural network-based methods is the replacement or augmentation of analytic models, hand-crafted features, or traditional numerical schemes with parametrized function approximators—usually feedforward or recurrent neural networks—trained to optimize task-specific objectives. In the context of scientific problems, such as boundary value problems (BVPs), a common paradigm is to represent the unknown field $\psi(x)$ by a neural network output $N(x; W, b)$ , composed with auxiliary functions that enforce exact satisfaction of boundary conditions. Trial solutions take the form

$\psi_t(x; W, b) = \hat{\psi}(x) + F(x) \cdot N(x; W, b),$

where $\hat{\psi}(x)$ encodes the boundary, $F(x)$ is zero on $\partial D$ , and $N$ is a fully connected network mapping spatial coordinates to scalar predictions. The method proceeds by minimizing a residual loss over a set of (possibly unstructured) collocation points, typically the squared error of the governing equation evaluated at these points:

$L_\text{PDE}(W, b) = \sum_{i=1}^m [G(x_i, \psi_t(x_i), \nabla \psi_t(x_i), \nabla^2 \psi_t(x_i))]^2,$

with $G$ specifying the differential operator and any data terms (Kolluru, 2019).

For image processing or sequence modeling, ANNs are used as direct parametric models mapping from raw data (pixels, voxels, time series) to structured outputs (segmentations, classifications, regression targets). Architectures typically employ convolutional layers for spatial invariance and feature extraction (e.g., U-Net, DenseNet), sometimes augmented with recurrent units (LSTM, GRU) for temporal aggregation or dependency modeling (Srivastava et al., 2020, Buser et al., 2021).

2. Model Architectures and Design Paradigms

Neural network-based methods exhibit considerable architectural diversity, adapted to the modalities and objectives of each task:

Feedforward Dense Networks: Employed in regression, classification, and physics-inspired trial functions, utilizing layers of affine maps and nonlinear activations (e.g., sigmoid, ReLU, tanh) (Kolluru, 2019).
Convolutional Neural Networks (CNNs): Architectures such as DenseNet-121, ResNet, and U-Net employ convolutional blocks for extraction and hierarchical abstraction of image features, with dilations or skip connections to preserve resolution (Srivastava et al., 2020, Buser et al., 2021, Frid et al., 13 Nov 2025).
Recurrent and Time-Distributed Networks: Stacked GRUs or time-distributed convnets for volumetric medical data, enabling context-sensitive inference over sequences of slices or states (Srivastava et al., 2020).
Hybrid and Element Methods: Mesh-based neural element methods locally enrich finite element basis functions with neural subnetworks, allowing adaptive resolution and boundary-aware approximation (Wang et al., 23 Apr 2025).
Cell-Average Networks: 1D/2D cellular or stencil networks trained to propagate cell averages between time steps, replacing explicit fluxes with black-box neural updates (Qiu et al., 2021).
Variational and Ritz Networks: Neural networks are trained via energy or residual minimization, including generalized Ritz methods, adversarial loss formulations, and dual nested architectures to optimize in both trial and test spaces (Uriarte, 2024).

The choice of architecture is dictated by the governing data structure, desired inductive biases, boundary and initial conditions, and computational constraints.

3. Loss Formulation, Optimization, and Training Protocols

Loss functions in neural network-based methods are tailored to the target problem:

Physics-Informed Losses: For PDEs, residual minimization over collocation points or energy functional minimization (e.g., Ritz or variational forms) directly encodes the governing equations. Automatic differentiation provides exact gradients of neural functions with respect to both input and network parameters, circumventing the need for analytic Jacobians (Kolluru, 2019, Wang et al., 23 Apr 2025).
Supervised Cross-Entropy or Regression Loss: Standard for classification (e.g., in medical imaging segmentation, photon identification) and regression tasks, possibly with class weighting for imbalanced datasets (Srivastava et al., 2020, Buser et al., 2021, Frid et al., 13 Nov 2025).
Penalty and Barrier Functions: Penalty terms are introduced for soft enforcement of constraints (e.g., obstacle problems, weak boundary enforcement), often with continuation (homotopy) to facilitate convergence (Zhao et al., 2021).
Ensemble and Hybridized Losses: For uncertainty quantification and regularization, loss functions may combine ensemble averaging, auxiliary regression heads, or physics-informed label smoothing (Frid et al., 13 Nov 2025, Scher et al., 2020).

Optimization is typically performed with variants of stochastic gradient descent, Adam, or second-order methods (e.g., L-BFGS, quasi-Newton). Learning-rate annealing, gradient clipping, and regularization (L2 weight decay, dropout) are introduced to stabilize training and enhance generalizability. Detailed monitoring of both residual loss and physically relevant error norms (e.g., relative $L^2$ norm, absolute error surfaces) is essential to detect overfitting and guide hyperparameter selection (Kolluru, 2019).

4. Representative Applications and Benchmark Results

Neural network-based methods have demonstrated efficacy across diverse application domains:

Numerical Solution of Boundary Value Problems: For Laplace and Poisson equations on $N(x; W, b)$ 0 with mixed Dirichlet and Neumann boundary conditions, a three-layer fully connected NN with $N(x; W, b)$ 1 hidden units achieves maximum pointwise errors $N(x; W, b)$ 2– $N(x; W, b)$ 3 and relative $N(x; W, b)$ 4-norm errors $N(x; W, b)$ 5, with no mesh generation required and analytic differentiation of trial solutions at arbitrary points (Kolluru, 2019).
Medical Imaging (Intracranial Hemorrhage Detection): Time-distributed DenseNet-121 per slice, aggregated via a GRU, exceeds 92% accuracy and achieves per-study AUC up to 0.980, surpassing other deep CNN and RNN baselines. Federated learning extensions using FedAvg across 10 hospital sites demonstrate near-centralized performance while preserving data privacy (Srivastava et al., 2020).
Mesh-based and Element Methods: Neural network element spaces coupled to FE envelopes on 2D triangles, with local 2-hidden-layer sine-activated subnetworks, yield $N(x; W, b)$ 6-errors of $N(x; W, b)$ 7 on $N(x; W, b)$ 8 mesh, outperforming classical FE methods of equivalent polynomial order by orders of magnitude on coarse grids (Wang et al., 23 Apr 2025).
Cell-Average and Explicit Schemes for Time-Dependent PDEs: CANN achieves first-order convergence, sharp discontinuity capture with minimal numerical diffusion, and stability under large time steps ( $N(x; W, b)$ 9), at the expense of retraining for each mesh size (Qiu et al., 2021).
Inverse Problems and Imaging: Neural network-based regularization—including learned generators (GANs, VAEs), score-based diffusion models, and Plug-and-Play denoisers—outperform hand-crafted Tikhonov or TV penalties on ill-posed operator inversion, with extensive theoretical analysis in function spaces (Habring et al., 2023).

A selection of empirical results is summarized below:

Application Domain	Architecture/Method	Error/Accuracy
2D Laplace/Poisson BVP	3-layer FCNN, $\psi_t(x; W, b) = \hat{\psi}(x) + F(x) \cdot N(x; W, b),$ 0	max error $\psi_t(x; W, b) = \hat{\psi}(x) + F(x) \cdot N(x; W, b),$ 1– $\psi_t(x; W, b) = \hat{\psi}(x) + F(x) \cdot N(x; W, b),$ 2, rel $\psi_t(x; W, b) = \hat{\psi}(x) + F(x) \cdot N(x; W, b),$ 3 (Kolluru, 2019)
Intracranial Hemorrhage	TimeDist DenseNet-121 + GRU	Accuracy 92.3%, AUC (any): 0.98 (Srivastava et al., 2020)
NN Element Poisson, $\psi_t(x; W, b) = \hat{\psi}(x) + F(x) \cdot N(x; W, b),$ 4 FE	Patchwise 2HL-sine NNs	$\psi_t(x; W, b) = \hat{\psi}(x) + F(x) \cdot N(x; W, b),$ 5 error $\psi_t(x; W, b) = \hat{\psi}(x) + F(x) \cdot N(x; W, b),$ 6 ( $\psi_t(x; W, b) = \hat{\psi}(x) + F(x) \cdot N(x; W, b),$ 7) (Wang et al., 23 Apr 2025)
Cell-Average Schemes	CANN stencil, MLP $\psi_t(x; W, b) = \hat{\psi}(x) + F(x) \cdot N(x; W, b),$ 8–6 layers	Sharp shocks, 1st-order, no CFL restriction (Qiu et al., 2021)

These achievements underscore the flexibility and competitiveness of neural network-based approaches vis-à-vis classical numerical and statistical modeling.

5. Advantages, Limitations, and Best Practices

Advantages:

Universal Approximation: Ability to represent arbitrarily complex mappings from input spaces (geometry, data) to outputs, given sufficient depth/width.
Mesh and Grid Flexibility: Many NN solvers permit unstructured, non-uniform collocation points, eliminating the need for mesh generation and conforming discretization (Kolluru, 2019).
Compositionality and Transferability: Architectures can incorporate domain knowledge (e.g., exact boundary satisfaction, physical symmetries) through trial solution design and loss engineering.
Hardware Acceleration: Fully parallelizable training/inference leveraging GPU/TPU/ASIC hardware enables tractability for large-scale, high-dimensional problems (Lin et al., 2023).

Limitations:

Hyperparameter Sensitivity: Performance can saturate or degrade with suboptimal choice of network size, training grid, regularization, and learning rate (Kolluru, 2019).
Mesh/Step Dependency: For certain classes (e.g., mesh-based explicit solvers) each change in discretization may require retraining (Qiu et al., 2021).
Lack of Theoretical Guarantees: Convergence to the true solution across the domain is not guaranteed solely by minimizing collocation residuals; requires out-of-sample error monitoring.
Computational Overhead: Training can be expensive relative to hand-tuned schemes or classic linear solvers, particularly when large networks or ensemble models are used.
Limited Transparency: While automatic differentiation and closed-form outputs are beneficial, interpretability of learned parameters is often inferior to classical coefficients.

Best Practices:

Construct trial solutions that exactly enforce boundary/initial conditions so residual loss focuses on the interior problem.
Use automatic differentiation throughout for computational derivatives.
Select network architecture (depth/width) and training collocation counts (e.g., $\psi_t(x; W, b) = \hat{\psi}(x) + F(x) \cdot N(x; W, b),$ 9, $\hat{\psi}(x)$ 0 for 2D problems) by cross-validation.
Employ held-out test points for monitoring generalization and detecting overfitting even if training residuals decrease (Kolluru, 2019).

6. Emerging Trends and Theoretical Developments

Recent advancements extend neural network-based methods across domains and methodologies:

Hybrid FE-NN Solvers: Local neural enrichment within finite elements combines geometric fidelity with high expressivity, enabling the handling of singularities and complex boundary conditions (Wang et al., 23 Apr 2025).
Federated and Distributed Training: Federated averaging and privacy-preserving update schemes address data locality and patient confidentiality, with only minor trade-offs in convergence and downstream accuracy (Srivastava et al., 2020).
Low-Cost Inference and Model Compression: Sparsity-inducing methods, quantization for low-resource edge devices, and lossless parameter reduction via empirical linearity in ReLU networks, all contribute to practical deployment of neural solutions (Lin et al., 2023).
Function-Space Analysis: NETT, deep null-space and generator-based approaches have established convergence, stability, and error control for data-driven regularizers in inverse problems, including in Banach and infinite-dimensional settings (Habring et al., 2023).
Broadening Application Scope: Deployment of these methods encompasses PDE-constrained optimization, image processing, scientific discovery, uncertainty quantification, and non-traditional tasks such as meshless clustering or combinatorial quantization (BSQ) (Kajo et al., 2021).

Ongoing work aims to address generalization to complex/nonlinear systems, adaptive/anisotropic schemes, robustness under model misspecification, and integration with classical solvers for hybrid workflows.

7. Future Directions

Open research challenges and promising directions include:

Scalability to High Dimensions: Further algorithmic innovations are needed to maintain accuracy and training stability as problem dimensionality increases (e.g., through dimension-agnostic architectures or tailored sampling) (Lu et al., 2022).
Rigorous Benchmarking and Theory: Development of standardized evaluation sets, theoretical error bounds for complicated domains or operator classes, and proof of convergence for hybrid and adversarial losses (Habring et al., 2023, Qiu et al., 2021).
Fusing Neural Networks with Symbolic Models: Combining physically meaningful priors, symmetries, or computational graph constraints with learned representations, including for data-scarce and transfer learning settings.
Automated Model Selection and Adaptivity: Exploiting automated architecture search, adaptive mesh refinement, or online hyperparameter tuning to enhance robustness and minimize manual intervention.
Hardware Co-Design: Continued integration of neural algorithmic frameworks with custom hardware (optical, neuromorphic, FPGA) for ultra-fast, energy-efficient scientific computing (Lin et al., 2023).

Neural network-based methods thus represent a unifying, extensible paradigm for complex modeling, simulation, and inference problems in applied mathematics, engineering, and data science. Their ongoing evolution reflects the interplay between theory, algorithm design, empirical validation, and hardware advances, with significant potential for further transformative impact across disciplines.