Complex-Valued Neural Networks

Updated 18 November 2025

Complex-Valued Neural Networks are advanced models that operate with complex numbers to capture both amplitude and phase, making them ideal for signal processing domains.
They use specialized methods like Wirtinger calculus, split and polar activations, and spectral normalization to enhance training and generalization.
Empirical studies show CVNNs excel in MRI, radar, and seismic signal applications where phase information is critical for accurate modeling.

Complex-Valued Neural Networks (CVNNs) extend neural network architectures to the complex domain, treating both parameters and intermediate representations as elements of ℂ rather than ℝ. This extension is highly significant for domains characterized by phase and amplitude behaviors—such as signal processing, communications, radar, MRI, and others—where data are innately complex-valued and the underlying phenomena demand joint modeling of amplitude and phase. Theoretical advances in the past decade have established both the expressive power and several practical guidelines for building, optimizing, and interpreting CVNNs (Voigtlaender, 2020, Geuchen et al., 2023, Bassey et al., 2021).

1. Mathematical Framework of CVNNs

A complex-valued neural network replaces all real-valued quantities—inputs, weights, biases, activations, and outputs—with their complex analogues. The fundamental operation in a complex neuron is

$z = \mathbf{w}^\top \mathbf{x} + b, \quad y = \varphi(z)$

with $\mathbf{x},\mathbf{w},b,z,y\in\mathbb{C}$ and $\varphi:\mathbb{C} \to \mathbb{C}$ an activation function. In convolutional settings, kernels and feature maps become complex tensors, and convolution is conducted by splitting input and weight into real and imaginary parts, e.g., for $x = x_r + ix_i$ and $w = w_r + iw_i$ : $(w * x)_r = w_r * x_r - w_i * x_i, \quad (w * x)_i = w_r * x_i + w_i * x_r.$ This structure preserves algebraic and geometric properties critical to domains with oscillatory or wave-like signals.

For learning, the loss function $L:\mathbb{C}^n\to\mathbb{R}$ (e.g., mean squared error, cross-entropy) is optimized using Wirtinger calculus. Treating complex variables and their conjugates as independent, the update step for a weight $w$ is

$w \gets w - \eta \frac{\partial L}{\partial w^*}$

where $\frac{\partial}{\partial w^*}$ is the Wirtinger derivative (Abdalla, 2023, Hammad, 27 Jul 2024). Backpropagation through non-holomorphic activations and real-valued losses requires chain rules for both $z$ and $z^*$ components.

2. Activation Functions in the Complex Domain

The design of nonlinearity in CVNNs is governed by a sharp constraint: Liouville's theorem precludes the existence of bounded, nontrivial, entire (holomorphic on ℂ) activation functions. This yields three primary classes:

Fully complex (holomorphic): e.g., $\tanh(z)$ , $\sigma(z)=(1+e^{-z})^{-1}$ ; these are analytic except at isolated poles and thus unbounded or singular (Hammad, 27 Jul 2024, Abdalla, 2023).
Split activations: nonlinearities are applied separately to real and imaginary parts, $\varphi(z) = f_\Re(\Re z) + i f_\Im(\Im z)$ . Examples: split-ReLU, CReLU (Cole et al., 2020, Scardapane et al., 2018).
Phase-magnitude (polar): nonlinearity on modulus, phase preserved. Examples include modReLU, $\mathrm{modReLU}(z;b) = \max(0, |z|+b) (z/|z|)$ , and cardioid, $\frac{1+\cos(\arg z)}{2}z$ (Geuchen et al., 2023, Cole et al., 2020).
Non-parametric activations: kernel activation functions (KAFs) build $\varphi(z)$ as a learned kernel expansion in ℂ with learned weights, yielding great flexibility but increased parameter count (Scardapane et al., 2018).

Optimal universal approximation requires the activation to be smooth and non-polyharmonic, thereby generating all mixed monomials $z^p \bar{z}^q$ necessary for dense approximation in $C^k$ function spaces (Geuchen et al., 2023, Voigtlaender, 2020).

3. Expressivity, Approximation Theorems, and Generalization

The expressivity of CVNNs has been characterized in terms of both universal approximation and quantitative rates. For an activation $\varphi$ that is $C^r$ -smooth and non-polyharmonic, and a target $C^k$ function $f$ on a compact $K\subset\mathbb{C}^n$ , there exists a one-hidden-layer CVNN with $m$ neurons such that the uniform approximation error scales as

$\|f - F_m\|_{\infty, K} = O(m^{-k/(2n)})$

where $n$ is the complex input dimension, and $k$ its smoothness (Geuchen et al., 2023, Caragea et al., 2021). This rate mirrors the real case but with effective dimensionality doubled ( $\mathbb{C}^n\cong\mathbb{R}^{2n}$ ). Under mild continuity assumptions on parameter selection, these rates are optimal. However, the curse of dimensionality is fundamental: to achieve error $\epsilon$ , one needs $m \gtrsim \epsilon^{-2n/k}$ .

For generalization, the capacity of a CVNN is controlled by the product of spectral norms of the weight matrices and the Lipschitz constants of the activations (the "spectral complexity"). Formal bounds establish that the generalization gap scales in proportion to network spectral complexity, with empirical validation across multiple datasets (Chen et al., 2021). Explicit regularization of spectral norms (via spectral normalization or norm control) and careful choice of activation can reduce overfitting and improve generalization in practice.

4. Architectures, Modules, and Optimization

Implementing CVNNs requires precise redesign of all core modules:

Linear/Conv layers: Weights and biases as pairs of real tensors; forward pass involves four real multiplications per complex multiplication. The Gauss trick can reduce this to three (Smith, 2023, Mayer et al., 2023).
Normalization: Batch and layer normalization in ℂ require whitening the joint covariance of $(\Re z, \Im z)$ , typically by matrix square-root inversion. Alternatively, separate normalization per channel is a practical simplification (Abdalla, 2023, Smith, 2023).
Pooling: Max or average pooling per channel; phase-sensitive pooling is less common (Cole et al., 2020).
Attention and manifold layers: Multi-head attention and convolution over Riemannian homogeneous spaces extend naturally to complex inputs via dedicated modules (Smith, 2023).
Initialization: Proper complex initialization must scale variance as $2/(n_\text{in}+n_\text{out})$ , either by independent uniform draws for real/imag parts or by sampling modulus from a Rayleigh distribution and phase from uniform (Abdalla, 2023, Barrachina et al., 2023).
Backpropagation: Standard autograd tools exploit Wirtinger derivatives to propagate gradients through non-holomorphic functions and real-valued losses (Abdalla, 2023, Hammad, 27 Jul 2024).

Several libraries (Deep-Complex-Networks, cvnn for TensorFlow, native PyTorch complex modules post v1.6) provide partial support for these modules, but most research implementations require custom layer definitions (Abdalla, 2023, Smith, 2023, Barrachina et al., 2023).

5. Empirical Results, Domains of Superiority, and Limitations

CVNNs demonstrate significant empirical advantages when the task's underlying structure involves intrinsic complex-valued phenomena. For example:

MRI fingerprinting: CVNNs with modReLU or cardioid activations provide lower reconstruction error than parameter-matched real-valued networks, with the same number of (real) parameters (Cole et al., 2020).
Seismic and geophysical signals: Complex convolutions better capture phase-sensitive features, yielding higher classification accuracy and faster convergence; phase aliasing is suppressed compared to real-valued analogs (Dramsch et al., 2019).
Iris recognition: Fully complex nets with sector-based ReLU and complex batchnorm outperform both hand-crafted IrisCode and real-valued deep baselines (1.31% FRR vs 1.77% or 3.43% for real nets and classic codes, respectively) (Nguyen et al., 2020).
Image and patch matching: Encodings from complex-valued nets yield improved FPR95 in patch similarity, especially for tasks requiring detailed structure matching (Jiang et al., 2018).
Synthetic non-circular data: CVNNs generalize better, with higher accuracy and lower overfitting than real-valued nets of equal parameter count (Barrachina et al., 2020).
Complex-valued benchmarks: On DFT-transformed MNIST and radar datasets, even parameter-equated real-valued networks are outperformed in both accuracy and robustness by CVNN architectures and structured real-valued surrogates such as Steinmetz/Analytic networks (Venkatasubramanian et al., 16 Sep 2024).

However, for strictly real-valued data, especially where phase carries no semantic weight, parameter-matched real-valued networks either match or outperform CVNNs (Mönning et al., 2018, Sarroff et al., 2015). On such problems, imaginary weights become redundant; their magnitude simply tracks the real components, resulting in doubled computation with no gain.

CVNNs typically require more careful tuning of initialization, learning rates, and regularization, are numerically less robust to over-parameterization or inappropriate activation selection, and computational overhead is increased—each complex multiply requires up to 4 real multiplies (or 3 with optimized algorithms) (Mayer et al., 2023). For low-power or edge deployment, parameter count and FLOPs must be evaluated precisely to justify adoption.

6. Recent Extensions and Research Directions

Recent work has addressed several advanced topics:

Non-parametric activation functions: Complex kernel activation functions (KAFs) enable neurons to learn rich, adaptive, and potentially locally holomorphic nonlinearities, at increased cost (Scardapane et al., 2018).
Information-theoretic generalization bounds: Structures such as Steinmetz and Analytic Neural Networks process real/imaginary parts in coupled real-valued subnets, with analytic-constraint regularization; this enforces orthogonality and analytic signal structure, yielding provably tighter generalization bounds and improved noise robustness (Venkatasubramanian et al., 16 Sep 2024).
Manifold-based convolutions and normalization: These exploit the geometry of the complex plane for equivariant processing in applications such as RF fingerprinting and near-field imaging (Smith, 2023).
Specialized optimizers and batchnorm enhancements: Practical improvements are being introduced to accommodate the dynamics of complex gradients and weight adaptation (Abdalla, 2023).

Challenges remain in designing truly bounded, fully holomorphic, and numerically stable activations; providing integrated support across major deep learning frameworks; and establishing theoretical generalization and capacity guarantees in $\mathbb{C}$ for deep architectures (Abdalla, 2023, Hammad, 27 Jul 2024, Chen et al., 2021).

7. Applications and Best Practices

Primary domains of application include:

Signal processing: Communications, radar, sonar, MRI, PolSAR, and spectral estimation—where complex feature spaces directly reflect the signal's physical properties (Bassey et al., 2021, Abdalla, 2023).
Computer vision and pattern recognition: Phase-sensitive object recognition and finer textural discrimination, as in iris or patch matching (Nguyen et al., 2020, Jiang et al., 2018).
Physics-informed learning: Tasks where modeling wave mechanics, quantum phenomena, or other amplitude/phase-coupled systems is critical.

For maximal advantage, CVNNs should be chosen when signal phase is part of the natural semantics, when compact representation of amplitude/phase relationships is required, or when the data-generating process is best described in the frequency domain. Practitioners should match parameter count fairly between real and complex networks, use variance-scaled complex initializations, and apply activation and normalization layers specifically designed for the complex domain. Spectral-norm regularization, early stopping based on tracked spectral complexity, and appropriate architectural choices are recommended to limit overfitting and optimize out-of-sample performance (Chen et al., 2021, Voigtlaender, 2020, Geuchen et al., 2023, Abdalla, 2023).

References