Complex Kernel Activation Functions

Updated 10 May 2026

Complex KAFs are non-parametric activation functions for neural networks that use kernel expansions to adapt the nonlinearity during training.
They overcome limitations of split activations by modeling complex-valued interactions and phase-amplitude dependencies with kernels like the complex Gaussian.
Widely linear and multi-kernel extensions boost expressiveness, leading to faster convergence and superior performance in image and signal processing tasks.

Complex Kernel Activation Functions (KAFs) are a class of flexible, non-parametric activation functions for neural networks, distinguished by their construction via kernel expansions over fixed dictionaries in either the real or complex domain. Unlike traditional fixed-form activations, KAFs possess a large number of adaptable degrees of freedom per neuron, permitting data-driven adaptation of the nonlinearity itself during end-to-end training. In the complex domain, KAFs address longstanding challenges in complex-valued neural networks (CVNNs), such as properly modeling complex-valued nonlinearities and enabling expressivity beyond split or phase-amplitude techniques. Modern variants, including widely linear and multi-kernel KAFs, further enhance expressive power without significant overhead. Complex KAFs are now a central component in the study of universal approximation in CVNNs, empirical performance in signal and image processing, and the integration of RKHS theory with deep learning architectures.

1. Mathematical Formalism and Architectures

A standard complex kernel activation function applies to a scalar complex pre-activation $z \in \mathbb{C}$ and computes

$g(z) = \sum_{i=1}^D \alpha_i\, k(z, c_i),$

where $c_i \in \mathbb{C}$ are dictionary centers (atoms), $\alpha_i \in \mathbb{C}$ are learnable mixing coefficients, and $k: \mathbb{C} \times \mathbb{C} \to \mathbb{C}$ is a positive-definite complex kernel (commonly a complex Gaussian). Extension to a 2D dictionary leads to the fully complex KAF,

$g(z) = \sum_{n=1}^D \sum_{m=1}^D \alpha_{n,m} \, K(z, d_n + i d_m),$

where the atoms $\{d_n + i d_m\}$ are sampled as a D×D square grid in the complex plane (Scardapane et al., 2018). Matrix-vector implementations allow efficient batch computation, and the entire operation is differentiable with respect to both $\alpha$ and kernel hyperparameters.

In multilayer CVNNs, each neuron may have its own distinct set of mixing coefficients, and dictionary atoms are chosen a priori, typically as a uniform grid over a subset of $\mathbb{C}$ . The output is typically left complex-valued, with appropriate real-valued loss functions (e.g., cross-entropy applied to real/imag parts or magnitude/phase representation).

2. Kernel Choices and Widely Linear Extensions

The design and selection of the kernel function $k$ is pivotal for expressivity. Notable kernel families include:

Complex Gaussian kernel: $g(z) = \sum_{i=1}^D \alpha_i\, k(z, c_i),$ 0, offering non-stationary, oscillatory behavior in $g(z) = \sum_{i=1}^D \alpha_i\, k(z, c_i),$ 1.
Independent real-imaginary (widely linear) kernel: $g(z) = \sum_{i=1}^D \alpha_i\, k(z, c_i),$ 2, with $g(z) = \sum_{i=1}^D \alpha_i\, k(z, c_i),$ 3 a real-valued positive semidefinite kernel (Scardapane et al., 2018, Scardapane et al., 2019).
Polynomial and analytic kernels: Including complex polynomial and Szegő kernels, provided they are positive-definite in Hermitian sense.

Expressiveness of standard (single-kernel) complex KAFs is limited by identities such as $g(z) = \sum_{i=1}^D \alpha_i\, k(z, c_i),$ 4 and $g(z) = \sum_{i=1}^D \alpha_i\, k(z, c_i),$ 5 in the underlying vector-valued RKHS. To overcome these, the widely linear KAF (WL-KAF) augments the model as

$g(z) = \sum_{i=1}^D \alpha_i\, k(z, c_i),$ 6

where $g(z) = \sum_{i=1}^D \alpha_i\, k(z, c_i),$ 7 is a pseudo-kernel tied to $g(z) = \sum_{i=1}^D \alpha_i\, k(z, c_i),$ 8 (often the conjugate kernel) (Scardapane et al., 2019). This generalization captures the full vector-valued structure without increasing the number of trainable parameters, and matches the theoretical representer theorem for complex RKHS.

Empirically, WL-KAFs consistently improve performance and convergence speed over standard KAFs in a range of complex-valued classification tasks.

3. Training, Regularization, and Practical Implementation

Training of complex KAFs is fully compatible with modern gradient-based optimizers. Backpropagation employs Wirtinger calculus to handle complex derivatives, and autodiff frameworks manage parameter updates for $g(z) = \sum_{i=1}^D \alpha_i\, k(z, c_i),$ 9, kernel bandwidth $c_i \in \mathbb{C}$ 0, and network weights. Initialization typically employs complex kernel ridge regression to approximate a baseline activation (e.g., complex ELU) at the dictionary atoms.

Regularization strategies focus on weight decay (ℓ₂ penalty) on both $c_i \in \mathbb{C}$ 1 and network weights, with hyperparameters selected by grid search or cross-validation. Bandwidth $c_i \in \mathbb{C}$ 2 may be initialized by the grid-spacing rule $c_i \in \mathbb{C}$ 3 and optionally learned. There are no special requirements for maintaining holomorphicity, but care is needed to ensure positive-definite kernels and prevent phase collapse.

Computational cost scales as $c_i \in \mathbb{C}$ 4 per neuron per layer (or $c_i \in \mathbb{C}$ 5 for fully complex KAFs with 2D grids); for WL-KAFs, cost is comparable to standard KAFs provided the pseudo-kernel is efficiently computed (e.g., as complex conjugate).

4. Comparison with Alternative Activation Designs

Complex KAFs are contrasted with several alternative approaches to complex-valued activation functions:

Split activations: $c_i \in \mathbb{C}$ 6 (e.g., split-RELU, split-tanh). These treat real and imaginary parts independently but cannot model complex interactions or nonlinear phase-amplitude dependencies.
Phase-amplitude methods: $c_i \in \mathbb{C}$ 7 or modReLU, which are more restrictive and often introduce singularities or discontinuities.
Parametric activations: e.g., modReLU with learnable bias; flexible but of limited expressivity per neuron.
Fully analytic functions: e.g., tanh $c_i \in \mathbb{C}$ 8, erf $c_i \in \mathbb{C}$ 9, which are typically not universal in $\alpha_i \in \mathbb{C}$ 0 and may yield pathological behaviors at singularities.

KAFs, especially with widely linear or multi-kernel extensions, are provably universal approximators on compact subsets of $\alpha_i \in \mathbb{C}$ 1 and are inherently smooth and data-adaptive by construction (Scardapane et al., 2018, Scardapane et al., 2017, Scardapane et al., 2019).

5. Empirical Performance and Applications

Extensive experimentation demonstrates the empirical advantages of complex KAFs:

Complex image classification: In FFT-transformed MNIST, Fashion-MNIST, EMNIST, and Latin-OCR benchmarks, complex KAFs and WL-KAFs achieve higher accuracy (up to 99.03%) and faster convergence than real NNs or standard CVNNs (Scardapane et al., 2019).
Channel equalization and signal prediction: On synthetic and real-world complex sequence data, KAFs deliver lower MSE and higher $\alpha_i \in \mathbb{C}$ 2 compared to parametric and fixed complex nonlinearities (Scardapane et al., 2018).
Generalization and stability: When configured with appropriate kernel bandwidth (e.g., $\alpha_i \in \mathbb{C}$ 3), KAF networks exhibit uniform stability under SGD training and reduced generalization gap (Cirillo et al., 2019).

Results consistently indicate that the gain in expressivity via non-parametric, kernel-based activations exceeds what can be achieved by increasing network depth or width with simpler activations.

6. Multi-Kernel and Multi-Dimensional Complex KAFs

Multi-kernel KAFs ("multi-KAFs") further enhance flexibility by forming a learned convex or linear mixture of multiple kernel types within each neuron:

$\alpha_i \in \mathbb{C}$ 4

where $\alpha_i \in \mathbb{C}$ 5 are distinct complex kernels and $\alpha_i \in \mathbb{C}$ 6 mixing coefficients (Scardapane et al., 2019). This enables adaptive kernel selection and automatic refinement of the activation shape to local data properties, leading to improved accuracy and faster convergence in complex-valued image recognition tasks.

Multi-dimensional KAFs (e.g., 2D-KAFs) define

$\alpha_i \in \mathbb{C}$ 7

where $\alpha_i \in \mathbb{C}$ 8, allowing nonlinear interaction across pairs of channels or features (Scardapane et al., 2017, Jadon et al., 2019).

7. Theoretical Guarantees and Open Problems

Complex KAFs, both standard and widely linear, are universal approximators for continuous mappings over compact subsets of $\alpha_i \in \mathbb{C}$ 9. Their smoothness and boundedness properties admit direct application of stability and generalization analysis via the Lipschitz and β-smoothness of the associated empirical risk, ensuring well-behaved generalization gaps under SGD regimes (Cirillo et al., 2019).

Open challenges include:

Efficient scaling to very large dictionaries without prohibitive memory cost.
Automatically learning or adapting the dictionary atoms, rather than fixing them a priori.
Extension to structured or hierarchical kernels tailored for specific CVNN modules.
Optimal selection of kernel types and their mixtures, especially in high-dimensional complex domains (Scardapane et al., 2019).
Incorporating complex KAFs into advanced architectures such as gated RNNs or deep generative models operating on complex data (Scardapane et al., 2018).

Complex KAFs thus represent a scalable, theoretically principled, and highly expressive solution to activation design in complex-valued neural networks and related domains.