Deep Complex Networks

Updated 23 June 2026

Deep complex networks are multi-layer neural architectures defined over complex numbers, enabling direct modeling of phase and magnitude data from sources like MRI and audio.
They integrate complex-valued convolutions, specialized activations (e.g., modReLU, CReLU), and tailored normalization techniques to achieve superior representational capacity.
Techniques such as Wirtinger calculus and advanced initialization schemes ensure effective backpropagation and universality for a range of applications from signal processing to wireless communications.

A deep complex network is a multi-layer neural architecture whose parameters, activations, and (optionally) intermediate operations are defined natively over the field of complex numbers. Such architectures exploit the richer algebraic structure of ℂ to better represent and process data sources with native phase and magnitude structure, including, but not limited to, MRI, audio, radar, and wireless communications. Modern deep complex networks incorporate complex-valued convolutions, batch normalization, initialization, and activations, and they are trained end-to-end using backpropagation in the complex domain. Recent work has established the superior representational capacity, universality, and practical advantages of deep complex-valued neural networks (DCNNs) across a spectrum of disciplines.

1. Core Arithmetic and Architectural Principles

A defining feature of deep complex networks is the use of complex-valued linear and nonlinear primitives. The foundational operation is the complex-valued convolution. Given complex-valued tensors $f,g\in \mathbb{C}$ , their convolution at point $z$ is

$(f * g)(z) = \sum_k f(k) g(z-k)$

By expressing each variable in real and imaginary components, e.g., $f = X + iY$ , $g = a + ib$ , the convolution decouples into four real convolutions: $(X + iY) * (a + ib) = (X*a - Y*b) + i(Y*a + X*b)$ Analogous decompositions underpin complex linear layers, facilitating direct implementation in mainstream deep learning toolkits as grouped real-valued layers with explicit channel management (Trabelsi et al., 2017, Cole et al., 2020, Zhao et al., 2018).

At the nonlinear stage, activation functions for complex-valued networks must preserve at least phase information if not the full analytic structure. Architectures make use of activations such as:

CReLU: $\mathrm{CReLU}(z) = \mathrm{ReLU}(\Re z) + i\,\mathrm{ReLU}(\Im z)$
modReLU: $\mathrm{modReLU}(z) = \mathrm{ReLU}(|z| + b) \frac{z}{|z|}$ , $b$ learnable
zReLU: $z$ passed if $z$ 0, else zero
cardioid: $z$ 1 Certain activation functions, such as CReLU and modReLU, have been empirically found to outperform others depending on application (Trabelsi et al., 2017, Cole et al., 2020).

2. Initialization and Normalization in the Complex Domain

Weight initialization generalizes Xavier/He criteria to the complex case by drawing magnitudes $z$ 2 from a Rayleigh distribution with parameter $z$ 3 set to match target variance, and phases $z$ 4 uniformly in $z$ 5, so $z$ 6 and $z$ 7. Glorot/He generalizations dictate $z$ 8 as $z$ 9 or $(f * g)(z) = \sum_k f(k) g(z-k)$ 0 for Glorot and He, respectively (Trabelsi et al., 2017).

Batch normalization for complex activations cannot simply divide by a scalar variance. Instead, the joint distribution over real and imaginary parts is whitened by centering and multiplying with the inverse square root of the $(f * g)(z) = \sum_k f(k) g(z-k)$ 1 covariance matrix for each feature. Learnable affine parameters extend to a bias in ℂ and a $(f * g)(z) = \sum_k f(k) g(z-k)$ 2 symmetric scaling matrix, generalizing gain and shift (Trabelsi et al., 2017).

For non-holomorphic architectures such as deep complex RBF or networks with phase-transmittance transparency, initialization must account for the variance of both centers and weights at every layer, tying them to network width and input distribution to achieve non-saturated preactivations and enable convergence in deep settings (Soares et al., 2024).

3. Optimization and Backpropagation in ℂ

Gradient-based learning in deep complex networks leverages Wirtinger calculus, which provides well-defined derivatives for real-valued loss with respect to complex parameters: $(f * g)(z) = \sum_k f(k) g(z-k)$ 3 Practically, modern frameworks automatically separate and recombine gradients with respect to $(f * g)(z) = \sum_k f(k) g(z-k)$ 4 and $(f * g)(z) = \sum_k f(k) g(z-k)$ 5 at each layer (Trabelsi et al., 2017, Cole et al., 2020).

In variational or Bayesian settings, sparsification and uncertainty modeling are implemented via complex circular Gaussian posteriors on weights, with additive-noise parameterization supporting efficient reparameterization and sampling (Nazarov et al., 2020).

4. Universality and Expressivity of Deep Complex Networks

The universality of deep complex networks is contingent on the analytic properties of their activation functions. It is established that networks with uniform hidden width $(f * g)(z) = \sum_k f(k) g(z-k)$ 6 and non-holomorphic, non-antiholomorphic, non- $(f * g)(z) = \sum_k f(k) g(z-k)$ 7-affine activations are universal for continuous functions $(f * g)(z) = \sum_k f(k) g(z-k)$ 8 on compact domains $(f * g)(z) = \sum_k f(k) g(z-k)$ 9 (Geuchen et al., 2023). For non-polyharmonic activations, width can be reduced to $f = X + iY$ 0. Holomorphic or $f = X + iY$ 1-affine activations are insufficient, as their compositions are closed within restricted function classes.

This result sharpens the contrast with the real case (where width grows as $f = X + iY$ 2) and demonstrates that phase-sensitive nonlinearity—such as modReLU or cardioid—is not merely a design choice but a theoretical necessity for universal approximation in ℂ. Deep narrow complex networks achieve dimension-independent approximation error with increasing depth, providing expressivity even at moderate width.

5. Specialized Architectures and Applications

Convolutional and Residual Architectures

Complex-valued convolutional networks have been applied natively to MRI image reconstruction, achieving PSNR gains of $f = X + iY$ 3 and SSIM gains of $f = X + iY$ 4 compared to parameter-matched real-valued counterparts (Cole et al., 2020). Architectures explored include unrolled ISTA-style networks and complex U-Nets, with inputs and outputs represented naturally as complex tensors throughout. These models preserve phase information—critical for tasks such as flow imaging or phase-contrast angiography.

Complex ResNets and ConvLSTMs display improved learning of phase and spatial structure in audio, spectrogram, and music transcription tasks (Trabelsi et al., 2017).

Deep Complex Radial Basis Function Networks

Deep C-RBF architectures, where every hidden unit is a complex-weighted, real (or optionally phase-transmittance) RBF centered at a point in ℂ, have demonstrated superior convergence and denoising in high-dimensional communications tasks. Only parameter initializations tied to layer width and variance (PT-RBF) achieve reliable convergence in deep settings; conventional (random, $f = X + iY$ 5-means, constellation) initializations fail for multi-layer deployments (Soares et al., 2024).

Bayesian Sparsification

Bayesian methods for parameter pruning in complex-valued nets utilize complex Gaussian approximate posteriors with either Variational Dropout or ARD priors, supporting compression ratios up to $f = X + iY$ 6 in vision and audio benchmarks, at minor performance cost (Nazarov et al., 2020).

Domain-Specific Deployments

Wireless: Replacement of FFT-based OFDM receivers with deep complex-valued convolutional networks yields up to $f = X + iY$ 7 BER improvement over LMMSE in adverse fading (Zhao et al., 2018).
Biological modeling: Neuronal synchrony and dynamic binding in Boltzmann machines are achieved via phase-based complex units (Reichert et al., 2013).
Graph/network prediction: Deep learning preserves scale-free and small-world properties in predicted links within complex networks (Tanaka et al., 2021).

6. Model Interpretability and Attribution

Interpretation of complex-valued neural networks requires attributions that respect their algebraic structure. DeepCSHAP extends Shapley-value-based methods from $f = X + iY$ 8 to ℂ, employing a complex chain rule and Wirtinger multipliers to compute additive feature attributions with provable Local Accuracy and Missingness (Eilers et al., 2024). Complementary gradient-based explanations adapt complex saliency, integrated gradients, and guided backpropagation. Empirically, DeepCSHAP improves localization and attribution quality—outperforming magnitude-only or quadrant-based explanations—across both real and complex-valued datasets.

An open-source PyTorch-compatible library provides differentiable access to all supported complex layers and attributors, enabling rigorous post hoc explanation of any compatible architecture.

7. Limitations, Current Trends, and Outlook

Despite empirical and theoretical advances, several issues constrain deep complex networks:

Specialized normalization (e.g., batch norm) and regularization beyond naive real-to-complex extensions remain underexplored (Trabelsi et al., 2017, Cole et al., 2020).
Nontrivial computational overhead: a complex convolution incurs four real operations per filter.
Activation design in the complex domain is ad hoc; holomorphic activations preclude universality.
Reliable initialization and training of deep, nonholomorphic architectures (notably RBF networks) require careful parameter scaling (Soares et al., 2024).
Extension to hypercomplex (quaternion, octonion) and structured or split-complex domains, as well as applications to additional imaging or physical modalities, remains open.

Future research is directed toward:

Learnable and kernelized complex activation functions.
Complex batch normalization tailored for non-holomorphic networks.
Hypercomplex network design and phase-driven attention/routing.
Scalable, hardware-efficient implementations.
Principled initialization and learning rules across deep, wide, and recurrent complex networks.

The synergy of algebraic structure and architectural depth establishes deep complex networks as an essential class for modeling phase-magnitude data, outperforming real-valued surrogates and opening new avenues in scientific machine learning and signal processing (Trabelsi et al., 2017, Cole et al., 2020, Soares et al., 2024).