Complex-Valued Neural Networks
- CVNNs are neural networks with weights, biases, and activations expressed as complex numbers, enabling natural modeling of amplitude and phase.
- They employ specialized optimization techniques like Wirtinger calculus and diverse activation functions to adapt classical NN architectures for complex data.
- CVNNs excel in applications such as signal processing, imaging, and wireless communications by leveraging joint magnitude-phase effects for improved accuracy.
A complex-valued neural network (CVNN) is an artificial neural network in which the weights, biases, inputs, activations, and outputs are represented as complex numbers. This extension of classical real-valued neural networks is motivated by the natural occurrence of complex-valued data and transformations in domains such as signal processing, communications, image analysis, radar, and biomedical imaging. CVNNs offer representational fidelity for phenomena involving amplitude and phase, enable operations naturally invariant under rotations in the complex plane, and facilitate modeling of joint magnitude-phase effects fundamental to many scientific and engineering disciplines (2101.12249, 2407.19258).
1. Mathematical Foundations and Core Design Principles
A CVNN generalizes classical neural network architecture by replacing real objects with complex analogues and adapting learning algorithms for the complex field. The forward operation of a CVNN layer is expressed as
where , , and is a complex-valued activation function. In convolutional architectures, both filters and feature maps are complex (1503.03438, 2101.12249).
Gradient-based learning in CVNNs employs the Wirtinger derivatives to handle non-holomorphic functions:
where . For a real-valued loss function , only the derivative with respect to is needed during backpropagation (1511.06351, 2312.06087).
2. Complex-Valued Activation Functions
The selection of suitable complex-valued activation functions (CVAFs) is central to CVNN design because the requirements of boundedness and analyticity (holomorphicity) cannot simultaneously be achieved for nontrivial functions over , per Liouville’s theorem (2407.19258). Two major classes of CVAFs are used:
- Split Activation Functions: Apply a real-valued function separately to real and imaginary parts, e.g., . This class includes split-ReLU, split-Tanh, Split-ELU, and their variants (1811.12351, 2407.19258).
- Fully Complex Activation Functions: Operate on the complex variable as a whole. Examples include the complex sigmoid, modReLU , cardioid, amplitude-phase saturating/nonlinearities, and new forms like Fully Complex Swish and Mish (2407.19258, 1802.08026, 1902.02085).
Non-parametric activation function families, notably kernel activation functions (KAFs), have been constructed directly in the complex domain, allowing data-driven adaptation of the nonlinearities (1802.08026, 1902.02085). The widely linear extension (WL-KAF) increases expressive power by incorporating pseudo-kernel terms, without increasing parameter count (1902.02085).
3. Learning and Optimization Techniques
CVNN optimization closely parallels that of RVNNs but requires complex-aware adaptations:
- Wirtinger Calculus: Used to compute gradients of functions not holomorphic, facilitating standard backpropagation even for non-analytic CVAFs (1511.06351, 2312.06087).
- Complex Backpropagation: Three main variants are reported (2407.19258):
- Complex Derivative Approach: Requires analytic activations; rarely practical for bounded nonlinearities.
- Partial Derivative (Split) Approach: Differentiates with respect to real and imaginary parts independently.
- Wirtinger-Based with Cauchy-Riemann Equations: Enforces or exploits analytic structure when present.
Regularization and Initialization: Initialization schemes (e.g., complex Glorot) must ensure desired variance over both real and imaginary components (2302.08286, 2312.06087). Online regularization methods (L1, L2,1) enable adaptive sparsification and model selection in nonstationary environments, such as wireless channel prediction (1901.10121).
Optimization methods must carefully manage magnitude and phase information to preserve signal properties; convergence can be hampered by poorly chosen initializations or inappropriate nonlinearities (2302.08286, 1811.12351).
4. Expressivity, Approximation, and Generalization
Both theoretical and empirical studies demonstrate that CVNNs are universal approximators for complex-valued functions under non-degenerate activation choices. For deep CVNNs (more than one hidden layer), universality holds unless the activation is almost everywhere holomorphic, antiholomorphic, or a polynomial in and (2012.03351).
Quantitative approximation results show that CVNNs with activations such as modReLU or cardioid achieve optimal error rates for approximating smooth functions on , with the approximation error scaling as , matching the scaling of real-valued networks when accounting for the doubled dimension (2303.16813, 2102.13092). In terms of network depth, CVNNs may require fewer layers than their real-valued counterparts to achieve the same error when approximating continuous complex-valued mappings (2502.11151).
Generalization bounds for CVNNs scale with the spectral complexity—the product of weight matrix spectral norms—demonstrating strong correlation between spectral complexity and out-of-sample error (2112.03467).
5. Architectural Diversity and Implementational Considerations
CVNNs encompass fully connected, convolutional, recurrent, and transformer-based architectures. Recent work has established the practical implementation of complex-valued transformers, with custom adaptations of attention mechanisms (e.g., using the real part of the complex inner product for softmax scoring) to preserve phase information (2502.11151). Deep complex-valued RBF networks (C-RBF) with specialized parameter initialization schemes achieve robust convergence in high-noise environments and complex-valued signal processing (2408.16778).
Specialized modules include:
- Complex Batch Normalization: Jointly whitens real and imaginary channels; covariance and mean are computed over the complex vector (often modeled as a 2D real vector) (2312.06087).
- Complex Pooling/Dropout: Operate on magnitude or maintain phase, modified compared to real-valued counterparts (2302.08286).
- Complex Weight Initialization: Ensures appropriate scaling of real/imaginary parts, using Rayleigh (polar) or normal (rectangular) approaches (2312.06087, 2302.08286).
Efficient implementation is supported in frameworks such as TensorFlow and PyTorch, now including complex data type support and autodifferentiation (2312.06087, 2302.08286). Open-source libraries have accelerated adoption and reproducibility (2009.08340, 2302.08286).
6. Application Domains
The application of CVNNs is especially prevalent where data is naturally or transformatively complex:
- Signal Processing: Speech enhancement, music analysis, radar and sonar, channel estimation, device identification, and equalization tasks. CVNNs explicitly leverage the I/Q (in-phase/quadrature) representation of RF signals, leading to higher accuracy in device fingerprinting and channel state monitoring (2202.09777, 2408.16778).
- Imaging: MRI fingerprinting, optics (wavefront modulation), holography, phase retrieval, and image reconstruction/denoising in complex Fourier domains (2102.13092, 2303.16813).
- Wireless Communications: Channel prediction, multi-user detection, beamforming, and joint pilot-precoder-quantization design. CVNN-based transformers achieve superior mean squared error, detection accuracy, and sum-rate optimization in 5G MIMO contexts, often with reduced computational complexity and fewer parameters compared to RVNN alternatives (2502.11151, 1901.10121).
- Classification Tasks: Non-circular complex input datasets (i.e., where real and imaginary parts are correlated or have different variances) benefit significantly from CVNNs, which outperform RVNNs and exhibit reduced overfitting (2009.08340).
7. Interpretability, Calibration, and Open Challenges
A recognized challenge in CVNN research is interpretability of the learned decision surfaces and reliable calibration of probabilistic outputs. Recent work adapts Newton–Puiseux theory to fit local polynomial surrogates to CVNN decision boundaries, decomposing them into fractional-power (Puiseux) series. Dominant Puiseux coefficients serve as phase-aligned curvature descriptors, enabling closed-form estimates of robustness and more precise temperature scaling for calibration. This approach yields calibration error reductions compared to conventional temperature scaling and reveals the intrinsic multi-sheeted, phase-sensitive structure of CVNN boundaries (2504.19176).
Several open questions and unresolved challenges remain:
- Bounded, Nonlinear, and Holomorphic Activation Functions: The development of CVAFs that balance analytic restrictions, numerical stability, and expressivity remains an active area of research (2407.19258, 2312.06087).
- Training Stability and Initialization: Sensitivity to weight initialization, architecture depth, and learning rates can hamper convergence; advanced or adaptive initialization and optimization approaches are called for (2302.08286, 1811.12351).
- Computational Complexity: Deep CVNNs can scale in computational cost as for fully-connected architectures, necessitating architectural and algorithmic refinements for deployment in low-power or real-time applications (2310.13075).
- Library Support: Although modern libraries offer basic support for complex tensors and operations, development of robust, feature-complete tools tailored to CVNNs is ongoing (2312.06087, 2302.08286).
- Calibration and Robustness: Analytic frameworks capable of quantifying and improving CVNN reliability in safety-critical or high-uncertainty domains are in early stages (2504.19176).
Conclusion
Complex-Valued Neural Networks provide a mathematically principled and practically potent extension of classical neural models, enabling holistic modeling of signals where phase and amplitude are essential. Advances in activation function design, optimization theory, expressivity analysis, architectural innovation, and interpretability have accelerated adoption in several scientific and engineering fields. Ongoing research aims to resolve foundational challenges in nonlinearity, training dynamics, implementation, and theoretical understanding, ensuring CVNNs continue to evolve as indispensable tools for modeling and processing complex-valued phenomena.