Papers
Topics
Authors
Recent
Search
2000 character limit reached

Complex Activation Functions Overview

Updated 26 March 2026
  • Complex activation functions are advanced neural network nonlinearities that utilize flexible, data-driven parameterizations to improve expressivity and convergence.
  • They encompass a variety of approaches, including parameterized, kernel-based, complex-valued, polynomial, logical, and CDF-based methods tailored for specific computational domains.
  • Empirical results demonstrate that these functions boost accuracy and training speed while enabling secure inference and robust performance across diverse architectures.

Complex activation functions generalize the traditional nonlinearity in neural networks by introducing richer parameterizations, flexible data-driven shapes, and adaptations tailored to specialized computational domains such as secure inference and complex-valued neural architectures. These functions target improved expressivity, convergence speed, task-specific adaptation, or mathematical compatibility with underlying number systems, and now represent an essential part of contemporary advanced neural modeling.

1. Parameterized and Adaptive Activation Functions

Modern deep learning increasingly relies on parameterized forms of classical activations, enabling each layer or neuron to tune key shape aspects beyond simple monotonicity or thresholding. The "Adaptively Customizing Activation Functions" framework introduces a minimal-parameter augmentation to any base activation ff by wrapping it as:

fA(ai,bi,ci,di;z)=bif(aiz+ci)+di,f_A(a_i, b_i, c_i, d_i; z) = b_i \cdot f(a_i z + c_i) + d_i,

where (ai,bi,ci,di)(a_i, b_i, c_i, d_i) are trainable per-layer scalars. This parameterization confers several advantages:

  • Input scaling (aia_i) dynamically sharpens or flattens nonlinearity.
  • Input/output shifting (ci,dic_i, d_i) adjusts centering and baseline of activation.
  • Output scaling (bib_i) modulates activation amplitude.

Sanity-checking with initialization ai=bi=1,ci=di=0a_i = b_i = 1, c_i = d_i = 0 precisely recovers the original function. Fixed activations such as ReLU and PReLU are special cases.

Empirically, this approach yields uniform or superior accuracy and notably faster early-phase convergence (elevated Δw|\Delta w| in early epochs), as compared to both fixed and more sophisticated adaptive activations like Swish, across models (AlexNet, VGGNet, GoogLeNet, ResNet, DenseNet) and tasks (CIFAR, PASCAL VOC, COCO) (Hu et al., 2021). Integrating the four per-layer parameters adds minimal overhead and no changes to loss or optimizer.

2. Non-Parametric and Kernel-Based Activation Functions

Beyond low-parameter adaptation, non-parametric forms allow activations to be flexibly shaped by the data itself. Notably, kernel activation functions (KAFs) model each neuron's nonlinearity as a kernel expansion over a fixed dictionary:

g(z)=n=1Dαnκ(z,dn),g(z) = \sum_{n=1}^D \alpha_n \kappa(z, d_n),

where dnd_n are dictionary points, αn\alpha_n are trainable coefficients, and κ\kappa is a (complex or real) positive-definite kernel. In complex-valued neural networks (CVNNs), both split (acting separately on {z},{z}\Re\{z\}, \Im\{z\}) and fully complex kernel expansions are realized (Scardapane et al., 2018). Such KAFs enable universal function approximation, significantly exceeding the flexibility of classical or mildly parameterized alternatives.

The widely linear KAF (WL-KAF) further boosts representational power by including pseudo-kernels that capture noncircular (improper) or more general joint properties of complex arguments, all without increasing parameter count, and empirically improves both accuracy and convergence on FFT-based image tasks (Scardapane et al., 2019).

Empirical results consistently show that split and fully complex KAFs, and especially WL-KAFs, outperform fixed and simple adaptive activations on regression, time-series, and classification tasks in the complex domain.

3. Complex-Valued Activation Functions: Structural and Analytical Principals

Designing activation functions over C\mathbb{C} entails fundamental constraints. Liouville’s theorem prohibits any function that is both analytic (entire) and bounded on C\mathbb{C} from being non-constant. Consequently, complex-valued activation function (CVAF) design decomposes into three broad classes (Hammad, 2024):

  • Fully complex analytic activations: e.g., tanh(z)\tanh(z), σ(z)=1/(1+ez)\sigma(z)=1/(1+e^{-z}), eze^{z}. These preserve the Cauchy–Riemann (CR) conditions but are unbounded or have domain-restricted analyticity due to singularities. Useful for restricted domains where magnitude control is possible.
  • Split activations: Apply real nonlinearities to {z}\Re\{z\} and {z}\Im\{z\} separately, e.g., osplit(z)=σr({z})+iσi({z})o_{\mathrm{split}}(z) = \sigma_r(\Re\{z\}) + i \sigma_i(\Im\{z\}). Boundedness is possible (inheriting real bounds), but analyticity is lost.
  • Amplitude-phase activations: Remap z|z| via a real nonlinearity, preserving arg(z)\arg(z), e.g., o(z)=a(z)eiarg(z)o(z) = a(|z|)e^{i\arg(z)} with aa typically bounded and smooth.

Gradient computation for non-analytic CVAFs uses Wirtinger calculus, treating zz and zˉ\bar{z} as independent variables under backpropagation.

4. Piecewise Polynomial and Boolean-Logical Functionality

Highly nonlinear and composite activations such as Swish/SiLU, GeLU, and Mish, while empirically effective, are computation-heavy and problematic for cryptographically secure inference. The "Compact" framework constructs piecewise polynomial approximations for such functions, shifting approximation effort into high-density input regimes (e.g., xN(0,1)x\sim\mathcal{N}(0,1) after batch-norm), and optimally allocating approximation resources using simulated annealing (Islam et al., 2023). Compact's output achieves <1%<1\% accuracy loss while reducing MPC runtime for deep models by 2×2\times5×5\times.

Another direction formalizes activation function computation as explicit probabilistic logic in logit space, deriving logit operators for AND, OR, and XNOR—and constructing computationally efficient “AIL” approximations using only addition, comparison, and sign operations (Lowe et al., 2021). These two-input activations generalize the geometric character of ReLU, with empirical performance matching or exceeding classical nonlinearities on a range of structured and symbolic tasks.

5. Parameterized Activation Families: EIS and CDF-Based Adaptation

The EIS family unifies and extends a wide class of nonlinearity types via five hyperparameters:

EIS(x;α,β,γ,δ,θ)=x(ln(1+ex))αβ+γx2+δeθx,\mathrm{EIS}(x; \alpha, \beta, \gamma, \delta, \theta) = \frac{x (\ln(1+e^x))^\alpha}{\sqrt{\beta+\gamma x^2}+\delta e^{-\theta x}},

controlling Softplus-like smoothing, ISRU-like root dampening, and exponential negative-tail shaping (Biswas et al., 2020). Specializations recover Softplus, Swish, ISRU, and identity, and empirical results on standard image benchmarks show that canonical EIS parameter choices outperform ReLU and Swish by up to 2.3%2.3\% in top-1 accuracy. EIS variants are analytically differentiable and computationally tractable given modern hardware.

Adaptation via cumulative distribution function (CDF) families introduces a learnable shape parameter (e.g., α\alpha in adaptive Gumbel activations), interpolating between logistic, Gumbel, and smoothed ReLU forms (Farhadi et al., 2019). This approach enables skewness and smoothness tuning per neuron. Empirically, adaptation yields improved or at least non-inferior final accuracy, accelerated convergence, and improved early-layer representational capacity.

6. Implementation and Practical Considerations

Integrating complex or adaptive activation functions into modern frameworks typically involves minimal code modification:

  • For parameterized activations (e.g., AReLU), the trainable scalars or vectors are added as parameters to the layer and handled by the optimizer identically to weights and biases (Hu et al., 2021).
  • KAF and WL-KAF require establishing kernel dictionaries and maintaining per-neuron mixing coefficients, with computational cost scaling linearly or quadratically with dictionary size, but remaining tractable for typical D (10\sim10–$20$ for split, 6\sim6–$10$ for complex) (Scardapane et al., 2018, Scardapane et al., 2019).
  • Non-analytic or complex-valued activations utilize Wirtinger (CR-) calculus, now directly supported in frameworks that allow complex automatic differentiation, such as Autograd or torch.complex (Hammad, 2024).
  • Piecewise polynomial approximations select partition granularity (mm), polynomial degree (kk), and quantization ring size (RR) to balance MPC runtime and accuracy constraints (Islam et al., 2023).

Regularization of additional parameters, normalization strategies for non-monotonic outputs, and architectural adjustments (e.g., channel duplication for multi-argument activations) may be required for stable training.

7. Empirical Performance, Tradeoffs, and Emerging Directions

Across broad empirical studies:

  • Layerwise parameterized activations consistently match or exceed the performance of both fixed and moderately adaptive functions, with significant convergence acceleration (Hu et al., 2021).
  • Kernel-based and non-parametric activations offer strong gains in settings with structured or complex-valued data, though with increased parameter and computation count per neuron (Scardapane et al., 2018, Scardapane et al., 2019).
  • Secure computation using piecewise polynomial approximation preserves near-identical accuracy while dramatically reducing runtime cost in privacy-sensitive applications (Islam et al., 2023).
  • Boolean-logical and two-input activations suit structured reasoning and learning of composite features, shifting the design paradigm beyond conventional 1→1 nonlinearities (Lowe et al., 2021).
  • CDF-based adaptive activations provide per-neuron curvature tuning that improves representation, most notably in early layers, and often accelerates convergence (Farhadi et al., 2019).

Limitations include increased implementation complexity for KAFs and amplitude-phase CVAFs, risk of overfitting with excessive adaptivity, and remaining open theory on optimal parameterization for generalization and interpretability.


In sum, complex activation functions now encompass a diverse and rapidly evolving family of methods—parameterized, non-parametric, kernel-based, polynomial, logical, adaptive, and complex-valued—supporting the increasing demands of modern neural networks for expressivity, efficiency, robustness, and domain compatibility. Continued research is focused on scalable adaptive parameterizations, better theory for analysis/generalization in the complex domain, and tighter integration of functional approximation with hardware and privacy constraints (Hu et al., 2021, Scardapane et al., 2018, Scardapane et al., 2019, Hammad, 2024, Islam et al., 2023, Lowe et al., 2021, Biswas et al., 2020, Farhadi et al., 2019).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Complex Activation Functions.