Papers
Topics
Authors
Recent
2000 character limit reached

Widely Linear Kernel Activation Functions

Updated 18 November 2025
  • Widely linear kernel activation functions extend standard KAFs by incorporating both the input and its complex conjugate, enabling richer nonlinear transformations.
  • They combine kernel and pseudo-kernel responses with trainable complex coefficients to capture dependencies between real and imaginary parts efficiently.
  • Empirical evaluations show WL-KAFs improve convergence speed and accuracy on complex pattern recognition tasks compared to conventional approaches.

Widely linear kernel activation functions (WL-KAFs) are a family of flexible, data-driven activation functions designed for complex-valued neural networks (CVNNs). These functions extend the kernel activation function (KAF) paradigm to the complex domain by using widely linear kernels, permitting richer nonlinear transformations that exploit both the input and its complex conjugate. WL-KAFs enable enhanced expressivity in CVNNs with minimal computational and parameter overhead, and have demonstrated improved performance on complex-valued pattern recognition tasks (Scardapane et al., 2019).

1. Background: Complex-Valued Neural Networks and KAFs

CVNNs generalize real-valued feedforward networks by allowing all weights, biases, and activations to be complex-valued. For an LL-layer CVNN, the transformation is

f(x)=f(L)(f(2)(f(1)(x))),f(x) = f^{(L)}(\cdots f^{(2)}(f^{(1)}(x))\cdots),

where each layer computes a complex affine transformation followed by an activation: h(i)=Wih(i1)+bih^{(i)} = W_i h^{(i-1)} + b_i, z(i)=g(h(i))z^{(i)} = g(h^{(i)}) with Wi,biW_i, b_i complex, and g:CCg: \mathbb{C}\to\mathbb{C} elementwise. Training minimizes a loss J(w)J(w), typically a sum of a data-dependent loss (e.g., squared error, complex cross-entropy) and a regularization term.

The standard KAF framework sidesteps the need to select a fixed analytic activation gg by learning each neuron's activation function via a one-dimensional kernel expansion. Fixing a dictionary {d1,,dD}C\{d_1,\dots,d_D\}\subset\mathbb{C}, the activation is

g(z)=n=1Dαnκ(z,dn)=k(z)α,g(z) = \sum_{n=1}^D \alpha_n \kappa(z, d_n) = \mathbf{k}(z)^\top \alpha,

where αCD\alpha\in\mathbb{C}^D are trainable and the kernel κ\kappa is usually the complex Gaussian, κ(z,d)=exp[γzd2]\kappa(z, d) = \exp[-\gamma|z-d|^2].

While KAFs allow neuronwise activations tuned to data, standard complex KAFs are limited; they cannot model arbitrary dependencies between the real and imaginary parts of zz, as they entail intrinsic constraints among subblocks of their reproducing kernel Hilbert space (RKHS) representations.

2. Widely Linear Kernel Activation Function Formulation

Widely linear kernels extend KAFs by incorporating a dependence on both zz and its conjugate zˉ\bar z, thus lifting the constraint that the expansion models only analytic functions. A widely linear kernel is defined as

kWL(z,z)=κ(z,z)+κ~(z,z),k_\mathrm{WL}(z, z') = \kappa(z, z') + \widetilde\kappa(z, z'),

where κ\kappa is the original kernel and κ~\widetilde\kappa is the so-called pseudo-kernel, typically involving the conjugate.

The widely linear KAF (WL-KAF) for a single neuron is then

g(z)=n=1Dαnκ(z,dn)+n=1Dβnκ~(z,dn)=k(z)α+k~(z)β,g(z) = \sum_{n=1}^D \alpha_n \kappa(z, d_n) + \sum_{n=1}^D \beta_n \widetilde\kappa(z, d_n) = \mathbf{k}(z)^\top \alpha + \widetilde{\mathbf{k}}(z)^\top \beta,

where both α,βCD\alpha, \beta \in \mathbb{C}^D. In practice, β\beta is often set to α\alpha^*, giving the compact trainable form g(z)=k(z)α+k~(z)αg(z) = \mathbf{k}(z)^\top\alpha + \widetilde{\mathbf{k}}(z)^\top\alpha^*. The number of trainable parameters per neuron remains DD complex coefficients, as in the standard KAF.

Different WL-KAF flavors are defined via choices of κ\kappa and κ~\widetilde\kappa:

  • Case 1 (Independent real/imag parts):

κ(z,d)=exp[γr{z}{d}2γi{z}{d}2], κ~(z,d)=exp[γr({z}{d})2+γi({z}{d})2],\begin{align*} \kappa(z, d) &= \exp\left[-\gamma_r|\Re\{z\} - \Re\{d\}|^2 - \gamma_i|\Im\{z\} - \Im\{d\}|^2\right], \ \widetilde\kappa(z, d) &= \exp\left[-\gamma_r(\Re\{z\}-\Re\{d\})^2 + \gamma_i(\Im\{z\}-\Im\{d\})^2\right], \end{align*}

with bandwidths γr,γi\gamma_r, \gamma_i.

  • Case 2 (Mixed-effects separable kernels):

κ(z,d)=q=1Qκq(z,d), κ~(z,d)=2iq=1Qωqκ~q(z,d),\begin{align*} \kappa(z, d) &= \sum_{q=1}^Q \kappa^q(z, d), \ \widetilde\kappa(z, d) &= 2i\sum_{q=1}^Q \omega^q \widetilde\kappa^q(z, d), \end{align*}

where each κq,κ~q\kappa^q,\widetilde\kappa^q is real-valued (typically Gaussian) and 0<ωq<10<\omega^q<1.

These forms allow independent or coupled modeling of real and imaginary parts, and can recover the standard KAF as a special case.

3. Training, Architecture, and Implementation

WL-KAFs are integrated into CVNN layers as drop-in replacements for analytic nonlinearities. The forward pass for a neuron computes the kernel and pseudo-kernel response vectors, uses the learned mixing coefficients, and outputs the linear combination:

  1. Linear preactivation: h=Wih(i1)+bih = W_i h^{(i-1)} + b_i.
  2. For neuron jj, compute kj=[κ(hj,d1),,κ(hj,dD)]\mathbf{k}_j = [\kappa(h_j, d_1), \dots, \kappa(h_j, d_D)]^\top and k~j=[κ~(hj,d1),,κ~(hj,dD)]\widetilde{\mathbf{k}}_j = [\widetilde\kappa(h_j, d_1), \dots, \widetilde\kappa(h_j, d_D)]^\top.
  3. Output: zj=kjαj+k~jβjz_j = \mathbf{k}_j^\top \alpha_j + \widetilde{\mathbf{k}}_j^\top \beta_j.

Gradients are propagated using

  • zj/αj=kj\partial z_j/\partial \alpha_j = \mathbf{k}_j
  • zj/βj=k~j\partial z_j/\partial \beta_j = \widetilde{\mathbf{k}}_j
  • zj/hj=(kj/hj)αj+(k~j/hj)βj\partial z_j/\partial h_j = (\partial \mathbf{k}_j / \partial h_j)^\top \alpha_j + (\partial \widetilde{\mathbf{k}}_j / \partial h_j)^\top \beta_j

Training employs standard optimizers (e.g., Adagrad, Adam). Hyperparameters include the dictionary size DD (e.g., $16$ or $64$), dictionary grid (uniform over [2,2]2[-2,2]^2), kernel bandwidths (γ,γr,γi\gamma, \gamma_r, \gamma_i), and regularization constant CC. Dictionary elements and kernel bandwidths are typically initialized using heuristics, then fine-tuned by gradient descent. Early stopping and weight decay on the kernel coefficients are recommended.

In terms of complexity:

  • Each neuron retains DD complex coefficients (as in standard KAF).
  • Forward/backward pass per neuron has O(D)O(D) cost for both kernel and pseudo-kernel, effectively doubling kernel computations relative to standard KAF, though this is a minor constant factor.

4. Empirical Evaluation and Results

Performance was investigated on image-classification benchmarks transformed to the complex domain using 2D FFT, with the top 100 coefficients per image selected as C100\mathbb{C}^{100} vectors. Datasets included MNIST, Fashion-MNIST, EMNIST Digits, and Latin OCR.

Each model used three hidden layers of 100 complex neurons with KAF or WL-KAF activations; the output layer performed a softmax on {h}2+{h}2|\Re\{h\}|^2 + |\Im\{h\}|^2. Optimization used Adagrad, batch size 40, with grid search to tune regularization.

Test accuracy (mean ±\pm std over five runs):

Model MNIST (%) F-MNIST (%) EMNIST-D (%) Latin OCR (%)
Real-valued NN 92.39 ±\pm 0.10 71.08 ±\pm 0.45 92.78 ±\pm 1.25 39.01 ±\pm 3.42
Complex KAF 97.18 ±\pm 0.27 81.94 ±\pm 0.91 98.11 ±\pm 2.04 71.79 ±\pm 2.40
WL-KAF (Case 1) 97.50 ±\pm 0.41 77.29 ±\pm 2.43 98.46 ±\pm 0.12 74.57 ±\pm 0.80
WL-KAF (Case 2) 96.22 ±\pm 0.74 82.89 ±\pm 1.09 99.03 ±\pm 1.01 72.53 ±\pm 0.36

WL-KAFs achieved performance improvements over standard KAFs that were statistically significant at p<0.05p<0.05 using paired tt-tests. Convergence speed with WL-KAFs was typically faster (plateau at \sim4,000 iterations) compared to standard KAFs (\sim6,000 iterations) (Scardapane et al., 2019).

5. Practical Considerations and Recommendations

  • Expressiveness vs. cost: WL-KAFs provide a substantial gain in nonlinear modeling power with negligible increase in parameter count or computational footprint. Their use is preferred over standard KAFs except in highly constrained deployment scenarios.
  • Case selection: Case 1 is suitable when the real and imaginary parts of the nonlinearity are approximately independent, minimizing hyperparameter requirements. Case 2 is indicated where modeling cross-correlation is necessary, e.g., in signal processing.
  • Hyperparameter tuning:
    • Dictionary: choose elements covering typical activation range (e.g., uniform in [2,2]2[-2,2]^2).
    • Bandwidth: initialize by median heuristic or rules from real KAF literature, allow further tuning.
    • Dictionary size DD should remain moderate (16–64) to balance capacity and overfitting risk.
  • Regularization and optimization:
    • Apply weight decay on α\alpha coefficients.
    • Employ early stopping based on validation loss.
    • Adaptive optimizers such as Adagrad or Adam handle the disparity in gradient scales.
    • Monitor gradient norms for real and imaginary components independently to maintain training stability.

For practical deployment, implementing WL-KAFs involves fixing the complex dictionary, coding the forward and backward routines for both kernel and pseudo-kernel, integrating with CVNN libraries (e.g., TensorFlow, PyTorch), and tuning the principal hyperparameters (DD, γ\gamma, CC, optionally QQ and {ωq}\{\omega^q\}). These steps suffice to equip CVNNs with neuron-specific, highly expressive nonlinearities suitable for a broad class of complex-valued learning problems (Scardapane et al., 2019).

6. Context and Significance

Widely linear kernel activation functions address a foundational limitation of standard (analytic) KAFs by enabling the modeling of arbitrary dependencies between real and imaginary parts in complex-valued transformations. This is accomplished without increasing the number of trainable parameters per neuron. The observed empirical gains—in accuracy and convergence rate—across standard complex pattern recognition benchmarks underscore their utility in both scientific and engineering contexts. Their introduction represents a principled extension of KAF theory, contributing to the expressiveness and practicality of modern CVNNs (Scardapane et al., 2019).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Widely Linear Kernel Activation Functions.