Artificial Neural Networks (ANNs)

Updated 2 December 2025

Artificial neural networks (ANNs) are interconnected computational models inspired by biological neurons, enabling complex tasks such as image recognition and language processing.
They utilize various architectures, such as feedforward, convolutional, recurrent, and spiking networks, each optimized for different data structures and applications.
Their training involves optimization techniques like backpropagation, while recent advances leverage biologically plausible learning and efficient parameter designs to enhance performance.

Artificial neural networks (ANNs) are parameterized computational models consisting of interconnected simple units—artificial “neurons”—whose weighted connections are iteratively updated by optimization algorithms. ANNs are inspired by the integrative, nonlinear, and plastic properties of biological neural circuits. They serve both as general-purpose machine learning engines and as modeling tools in neuroscience, neurobiology, engineering, and computational sciences. The architecture, learning dynamics, and representational capacities of ANNs have evolved over decades, enabling state-of-the-art performance across domains such as vision, language, robotics, medicine, and theoretical neuroscience (Yang et al., 2020, Chavlis et al., 4 Apr 2024, Nwadiugwu, 2020, Schmidgall et al., 2023).

1. Mathematical Structures and Biological Foundations

Artificial neurons operate by aggregating synaptic inputs, applying a nonlinearity, and transmitting this output to downstream units. Mathematically, the output of neuron $i$ is given by

$h_i = f\Bigl(\sum_j W_{ij}\,x_j + b_i\Bigr),$

where $W_{ij}$ is the weight from input $j$ to neuron $i$ , $b_i$ is a bias, $f$ is a nonlinear activation function (e.g., ReLU, $\tanh$ , sigmoid), and $x_j$ denotes inputs from previous layer neurons or raw data. Biological motivation traces to the summation of synaptic currents, nonlinear spike generation, and the organization of circuits in layers and columns (Yang et al., 2020, Nwadiugwu, 2020, Schmidgall et al., 2023).

Hebbian learning ( $\Delta W_{ij} \sim x_j x_i$ ) and spike-timing–dependent plasticity (STDP) are paradigmatic examples of biologically inspired learning rules (Schmidgall et al., 2023). The ANN training procedure usually involves “backpropagation,” whereby the loss gradient is efficiently propagated backward through the network to update all parameters:

$\theta \leftarrow \theta - \eta \frac{\partial L}{\partial \theta}$

( $\theta$ aggregates all weights and biases, $\eta$ is the learning rate, and $L$ is a differentiable loss function quantifying prediction error).

2. Core Architectures and Computational Regimes

ANNs comprise several canonical architectures, each tailored to the structure of the input data and application domain:

Feedforward (Multilayer Perceptron, MLP): Layers of units compute successive affine-nonlinear transformations, guaranteeing universal function approximation for sufficiently wide networks (Yang et al., 2020, Chen et al., 2017). Every layer computes

$r^{(l)} = f\left(W^{(l)} r^{(l-1)} + b^{(l)}\right)$

with input $r^{(1)} = x$ and output $y = W^{(L)} r^{(L-1)} + b^{(L)}$ .

Convolutional Neural Networks (CNNs): Specialized for spatial data, with parameter sharing and local receptive fields that enforce translation invariance and significantly reduce parameter count. The update is

$r^{(l)}_{i_C,i_H,i_W} = \sum_{j_C}\sum_{\Delta H,\Delta W} W^{(l)}_{i_C,j_C}(\Delta H,\Delta W) \, r^{(l-1)}_{j_C,i_H-\Delta H,i_W-\Delta W}$

Emulates local wiring in visual cortex (Yang et al., 2020, Chavlis et al., 4 Apr 2024).

Recurrent Neural Networks (RNNs): Designed for sequential and temporal processing. The recurrent update is

$c_t = W_r r_{t-1} + W_x x_t + b_r, \quad r_t = f(c_t), \quad y_t = W_y r_t + b_y$

Attractors, oscillatory and chaotic dynamics are possible, supporting memory and decision modeling (Yang et al., 2020).

Spiking Neural Networks (SNNs): Employ leaky-integrate-and-fire neuron models and spike-driven updates. Learning algorithms may use surrogate gradients or event-driven local rules, with emergent temporal coding properties (Walter et al., 24 Mar 2024, Schmidgall et al., 2023).

Layer normalization, batch normalization, attention modules, gating mechanisms (e.g., LSTM/GRU), and structurally constrained connectivity (such as Dale’s law for excitatory/inhibitory separation) allow precise control of information flow and stabilization of deep learning (Yang et al., 2020).

3. Learning Paradigms and Plasticity

ANNs can be trained under several optimization regimes:

Supervised Learning: Using labeled data $(x^{(i)}, y^{(i)}_{\rm target})$ and losses such as mean squared error

$L = \frac1N \sum_i \frac12 \lVert y^{(i)} - y^{(i)}_{\rm target} \rVert^2$

or cross-entropy for classification (Yang et al., 2020, Schmidgall et al., 2023, Chen et al., 2017).

Reinforcement Learning: Model-free or model-based scenarios where networks optimize expected cumulative reward via policy or value-based updates; deep RL often couples CNNs or RNNs with actor-critic or Q-learning frameworks (Yang et al., 2020).
Unsupervised Learning: Autoencoders, principal component analysis (by Hebbian rules), and generative models (e.g., variational autoencoders) operate by minimizing reconstruction error and maximizing data likelihood (Yang et al., 2020, Chen et al., 2017).
Biologically Plausible Learning: Local plasticity rules (Hebbian, STDP, three-factor), activity-dependent homeostasis, and meta-learned update equations are being actively explored. For STDP, the update is

$\Delta w_{ij} = \begin{cases} A_+ \exp\left(-\frac{\Delta t}{\tau_+}\right), & \Delta t > 0 \ -A_- \exp\left(\frac{\Delta t}{\tau_-}\right), & \Delta t < 0 \end{cases}, \quad \Delta t = t_{\mathrm{post}} - t_{\mathrm{pre}}$

Robust continual learning relies on a hierarchy of plasticity mechanisms (Schmidgall et al., 2023, Krishnan et al., 2019).

4. Structural Innovations and Parameter Efficiency

New advances in network topology draw explicitly on biological microcircuit features:

Dendritic ANNs (dANNs): Each neuron is decomposed into $D$ dendritic subunits with sparse receptive fields $\mathcal{R}_{s,d}$ and masked connectivity, mirroring the local sampling of dendrites. Activation at dendritic branch $d$ for soma $s$ is

$u_{s,d} = \sum_{n \in \mathcal{R}_{s,d}} w^{(1)}_{(s,d),n}\,x_n + b^{(1)}_{s,d}, \quad a_{s,d} = f_D(u_{s,d})$

Aggregated at soma:

$v_s = \sum_{d=1}^{D} w_{s,d}^{(2)} a_{s,d} + b_s^{(2)}, \quad z_s = f_S(v_s)$

dANNs achieve equivalent or superior accuracy on benchmark tasks (MNIST, FMNIST, KMNIST, EMNIST, CIFAR10) compared to classical fully connected nets, while using $1$–$3$ orders of magnitude fewer parameters and exhibiting robustness to noise and class imbalance (Chavlis et al., 4 Apr 2024).

Artificial Neural Microcircuits (ANMs): Modular, motif-driven small SNN blocks defined by block-diagonal connectivity matrices, assembled and evolved for specialized behaviors using novelty search and spike-train distance metrics. ANMs facilitate reuse, control over complexity, and hardware mappings in neuromorphic systems (Walter et al., 24 Mar 2024).

5. Internal Representations, Concept Encoding, and Generalization

ANNs develop distributed, mixed-selectivity representations, often diverging from stringent single-unit concept encoding:

Mixed vs. Class-Specific Selectivity: Nodes in dendritic/structured networks typically respond to multiple classes; class-activation hit entropy and selectivity indices highlight that mixed selectivity parallels biological cortex and improves generalization. Classical ANNs favor early class-specific encoding which can foster overfitting and catastrophic interference (Chavlis et al., 4 Apr 2024, Freiesleben, 2023).
Concept Representation: Empirical tools (activation maximization, network dissection, TCAV) indicate that while ANNs achieve human-level task accuracy, concepts (“car,” “metro”) are not stored in localizable neurons but rather in distributed population codes. Functional necessity for concepts in single units is generally weak; rigorous interpretability demands coactivation and ablation testing (Freiesleben, 2023).
Relations vs. Items: Analytical and empirical results in small networks reveal that linear networks tend to encode relational symmetries (e.g., permutation invariance) and generalize to unseen compatible items. Nonlinear networks (sigmoid, ReLU) favor associative memory of individual training items, limiting generalization unless the activation function possesses a sizable linear regime (e.g., $\tanh$ ) (Krause et al., 15 Apr 2024).

6. Neuroscience Integration, Theory, and Loss Landscape Geometry

ANNs provide testable mathematical models for neuroscientific hypotheses:

Sensory and Cognitive Modeling: CNN intermediate layers predict responses in visual cortex areas (V4, IT), while RNNs model context-dependent decision-making and working memory attractor dynamics paralleling prefrontal and parietal circuit recordings (Yang et al., 2020).
Loss Landscapes: The generalization and optimization of deep ANNs are shaped by the geometry of high-dimensional loss surfaces. Statistical mechanics connect Hopfield and Boltzmann machines to Ising models and thermodynamic principles. Flat minima (wide basins) are associated with enhanced generalization due to high entropy of robust parameter configurations; sharp minima tend to have poor test performance and can be artificially induced by batch-size or reparameterization (Böttcher et al., 5 Apr 2024).
Brain-Inspired Algorithms: Sleep-inspired offline STDP phases reduce catastrophic forgetting, foster forward transfer, and improve noise robustness in incremental learning scenarios, by decorrelating internal representations and reactivating previously suppressed pathways (Krishnan et al., 2019). Meta-learning of plasticity rules and combining multiple modalities (short-term, long-term) represent active directions (Schmidgall et al., 2023).

7. Key Application Domains and Limitations

ANNs are foundational across application sectors:

Medical and Biotech: Pattern recognition in anomaly detection, electronic noses, and pharmaceutical modeling leverages layered ANN architectures, achieving sensitivity and specificity competitive with expert human diagnosticians (Nwadiugwu, 2020, Xu et al., 2018).
Signal Processing and Control: Autonomous driving, hearing aids, and robotics rely on CNNs and RNNs for real-time pattern extraction and adaptive control (Nwadiugwu, 2020, Chen et al., 2017).
Wireless Networks: Diverse ANN architectures (FNN, CNN, RNN, SNN) optimize caching, virtual reality resource allocation, and multi-RAT management in next-generation wireless systems (Chen et al., 2017). Echo state networks with auditory cortex–inspired reservoirs outperform standard ESNs on dense time-series forecasting, offering computational efficiency and robustness (Matysiak et al., 15 Apr 2024).

Despite broad success, limitations remain in theoretical understanding, interpretability, integration of truly biologically plausible plasticity, and transition from monolithic deep architectures to modular assembly as exemplified by ANMs and dendritic networks (Diamant, 2018, Freiesleben, 2023, Walter et al., 24 Mar 2024, Schmidgall et al., 2023). Advances in statistically grounded learning theory, biologically inspired regularization, and modularity are poised to define next-generation ANN research.