Widely Linear Kernel Activation Functions

Updated 18 November 2025

Widely linear kernel activation functions extend standard KAFs by incorporating both the input and its complex conjugate, enabling richer nonlinear transformations.
They combine kernel and pseudo-kernel responses with trainable complex coefficients to capture dependencies between real and imaginary parts efficiently.
Empirical evaluations show WL-KAFs improve convergence speed and accuracy on complex pattern recognition tasks compared to conventional approaches.

Widely linear kernel activation functions (WL-KAFs) are a family of flexible, data-driven activation functions designed for complex-valued neural networks (CVNNs). These functions extend the kernel activation function (KAF) paradigm to the complex domain by using widely linear kernels, permitting richer nonlinear transformations that exploit both the input and its complex conjugate. WL-KAFs enable enhanced expressivity in CVNNs with minimal computational and parameter overhead, and have demonstrated improved performance on complex-valued pattern recognition tasks (Scardapane et al., 2019).

1. Background: Complex-Valued Neural Networks and KAFs

CVNNs generalize real-valued feedforward networks by allowing all weights, biases, and activations to be complex-valued. For an $L$ -layer CVNN, the transformation is

$f(x) = f^{(L)}(\cdots f^{(2)}(f^{(1)}(x))\cdots),$

where each layer computes a complex affine transformation followed by an activation: $h^{(i)} = W_i h^{(i-1)} + b_i$ , $z^{(i)} = g(h^{(i)})$ with $W_i, b_i$ complex, and $g: \mathbb{C}\to\mathbb{C}$ elementwise. Training minimizes a loss $J(w)$ , typically a sum of a data-dependent loss (e.g., squared error, complex cross-entropy) and a regularization term.

The standard KAF framework sidesteps the need to select a fixed analytic activation $g$ by learning each neuron's activation function via a one-dimensional kernel expansion. Fixing a dictionary $\{d_1,\dots,d_D\}\subset\mathbb{C}$ , the activation is

$g(z) = \sum_{n=1}^D \alpha_n \kappa(z, d_n) = \mathbf{k}(z)^\top \alpha,$

where $\alpha\in\mathbb{C}^D$ are trainable and the kernel $\kappa$ is usually the complex Gaussian, $\kappa(z, d) = \exp[-\gamma|z-d|^2]$ .

While KAFs allow neuronwise activations tuned to data, standard complex KAFs are limited; they cannot model arbitrary dependencies between the real and imaginary parts of $z$ , as they entail intrinsic constraints among subblocks of their reproducing kernel Hilbert space (RKHS) representations.

2. Widely Linear Kernel Activation Function Formulation

Widely linear kernels extend KAFs by incorporating a dependence on both $z$ and its conjugate $\bar z$ , thus lifting the constraint that the expansion models only analytic functions. A widely linear kernel is defined as

$k_\mathrm{WL}(z, z') = \kappa(z, z') + \widetilde\kappa(z, z'),$

where $\kappa$ is the original kernel and $\widetilde\kappa$ is the so-called pseudo-kernel, typically involving the conjugate.

The widely linear KAF (WL-KAF) for a single neuron is then

$g(z) = \sum_{n=1}^D \alpha_n \kappa(z, d_n) + \sum_{n=1}^D \beta_n \widetilde\kappa(z, d_n) = \mathbf{k}(z)^\top \alpha + \widetilde{\mathbf{k}}(z)^\top \beta,$

where both $\alpha, \beta \in \mathbb{C}^D$ . In practice, $\beta$ is often set to $\alpha^*$ , giving the compact trainable form $g(z) = \mathbf{k}(z)^\top\alpha + \widetilde{\mathbf{k}}(z)^\top\alpha^*$ . The number of trainable parameters per neuron remains $D$ complex coefficients, as in the standard KAF.

Different WL-KAF flavors are defined via choices of $\kappa$ and $\widetilde\kappa$ :

Case 1 (Independent real/imag parts):

$\begin{align*} \kappa(z, d) &= \exp\left[-\gamma_r|\Re\{z\} - \Re\{d\}|^2 - \gamma_i|\Im\{z\} - \Im\{d\}|^2\right], \ \widetilde\kappa(z, d) &= \exp\left[-\gamma_r(\Re\{z\}-\Re\{d\})^2 + \gamma_i(\Im\{z\}-\Im\{d\})^2\right], \end{align*}$

with bandwidths $\gamma_r, \gamma_i$ .

Case 2 (Mixed-effects separable kernels):

$\begin{align*} \kappa(z, d) &= \sum_{q=1}^Q \kappa^q(z, d), \ \widetilde\kappa(z, d) &= 2i\sum_{q=1}^Q \omega^q \widetilde\kappa^q(z, d), \end{align*}$

where each $\kappa^q,\widetilde\kappa^q$ is real-valued (typically Gaussian) and $0<\omega^q<1$ .

These forms allow independent or coupled modeling of real and imaginary parts, and can recover the standard KAF as a special case.

3. Training, Architecture, and Implementation

WL-KAFs are integrated into CVNN layers as drop-in replacements for analytic nonlinearities. The forward pass for a neuron computes the kernel and pseudo-kernel response vectors, uses the learned mixing coefficients, and outputs the linear combination:

Linear preactivation: $h = W_i h^{(i-1)} + b_i$ .
For neuron $j$ , compute $\mathbf{k}_j = [\kappa(h_j, d_1), \dots, \kappa(h_j, d_D)]^\top$ and $\widetilde{\mathbf{k}}_j = [\widetilde\kappa(h_j, d_1), \dots, \widetilde\kappa(h_j, d_D)]^\top$ .
Output: $z_j = \mathbf{k}_j^\top \alpha_j + \widetilde{\mathbf{k}}_j^\top \beta_j$ .

Gradients are propagated using

$\partial z_j/\partial \alpha_j = \mathbf{k}_j$
$\partial z_j/\partial \beta_j = \widetilde{\mathbf{k}}_j$
$\partial z_j/\partial h_j = (\partial \mathbf{k}_j / \partial h_j)^\top \alpha_j + (\partial \widetilde{\mathbf{k}}_j / \partial h_j)^\top \beta_j$

Training employs standard optimizers (e.g., Adagrad, Adam). Hyperparameters include the dictionary size $D$ (e.g., $16$ or $64$), dictionary grid (uniform over $[-2,2]^2$ ), kernel bandwidths ( $\gamma, \gamma_r, \gamma_i$ ), and regularization constant $C$ . Dictionary elements and kernel bandwidths are typically initialized using heuristics, then fine-tuned by gradient descent. Early stopping and weight decay on the kernel coefficients are recommended.

In terms of complexity:

Each neuron retains $D$ complex coefficients (as in standard KAF).
Forward/backward pass per neuron has $O(D)$ cost for both kernel and pseudo-kernel, effectively doubling kernel computations relative to standard KAF, though this is a minor constant factor.

4. Empirical Evaluation and Results

Performance was investigated on image-classification benchmarks transformed to the complex domain using 2D FFT, with the top 100 coefficients per image selected as $\mathbb{C}^{100}$ vectors. Datasets included MNIST, Fashion-MNIST, EMNIST Digits, and Latin OCR.

Each model used three hidden layers of 100 complex neurons with KAF or WL-KAF activations; the output layer performed a softmax on $|\Re\{h\}|^2 + |\Im\{h\}|^2$ . Optimization used Adagrad, batch size 40, with grid search to tune regularization.

Test accuracy (mean $\pm$ std over five runs):

Model	MNIST (%)	F-MNIST (%)	EMNIST-D (%)	Latin OCR (%)
Real-valued NN	92.39 $\pm$ 0.10	71.08 $\pm$ 0.45	92.78 $\pm$ 1.25	39.01 $\pm$ 3.42
Complex KAF	97.18 $\pm$ 0.27	81.94 $\pm$ 0.91	98.11 $\pm$ 2.04	71.79 $\pm$ 2.40
WL-KAF (Case 1)	97.50 $\pm$ 0.41	77.29 $\pm$ 2.43	98.46 $\pm$ 0.12	74.57 $\pm$ 0.80
WL-KAF (Case 2)	96.22 $\pm$ 0.74	82.89 $\pm$ 1.09	99.03 $\pm$ 1.01	72.53 $\pm$ 0.36

WL-KAFs achieved performance improvements over standard KAFs that were statistically significant at $p<0.05$ using paired $t$ -tests. Convergence speed with WL-KAFs was typically faster (plateau at $\sim$ 4,000 iterations) compared to standard KAFs ( $\sim$ 6,000 iterations) (Scardapane et al., 2019).

5. Practical Considerations and Recommendations

Expressiveness vs. cost: WL-KAFs provide a substantial gain in nonlinear modeling power with negligible increase in parameter count or computational footprint. Their use is preferred over standard KAFs except in highly constrained deployment scenarios.
Case selection: Case 1 is suitable when the real and imaginary parts of the nonlinearity are approximately independent, minimizing hyperparameter requirements. Case 2 is indicated where modeling cross-correlation is necessary, e.g., in signal processing.
Hyperparameter tuning:
- Dictionary: choose elements covering typical activation range (e.g., uniform in $[-2,2]^2$ ).
- Bandwidth: initialize by median heuristic or rules from real KAF literature, allow further tuning.
- Dictionary size $D$ should remain moderate (16–64) to balance capacity and overfitting risk.
Regularization and optimization:
- Apply weight decay on $\alpha$ coefficients.
- Employ early stopping based on validation loss.
- Adaptive optimizers such as Adagrad or Adam handle the disparity in gradient scales.
- Monitor gradient norms for real and imaginary components independently to maintain training stability.

For practical deployment, implementing WL-KAFs involves fixing the complex dictionary, coding the forward and backward routines for both kernel and pseudo-kernel, integrating with CVNN libraries (e.g., TensorFlow, PyTorch), and tuning the principal hyperparameters ( $D$ , $\gamma$ , $C$ , optionally $Q$ and $\{\omega^q\}$ ). These steps suffice to equip CVNNs with neuron-specific, highly expressive nonlinearities suitable for a broad class of complex-valued learning problems (Scardapane et al., 2019).

6. Context and Significance

Widely linear kernel activation functions address a foundational limitation of standard (analytic) KAFs by enabling the modeling of arbitrary dependencies between real and imaginary parts in complex-valued transformations. This is accomplished without increasing the number of trainable parameters per neuron. The observed empirical gains—in accuracy and convergence rate—across standard complex pattern recognition benchmarks underscore their utility in both scientific and engineering contexts. Their introduction represents a principled extension of KAF theory, contributing to the expressiveness and practicality of modern CVNNs (Scardapane et al., 2019).

Markdown Report Issue Upgrade to Chat

References (1)

Widely Linear Kernels for Complex-Valued Kernel Activation Functions (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Widely Linear Kernel Activation Functions.