Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 59 tok/s

Gemini 2.5 Pro 49 tok/s Pro

GPT-5 Medium 32 tok/s Pro

GPT-5 High 33 tok/s Pro

GPT-4o 127 tok/s Pro

Kimi K2 189 tok/s Pro

GPT OSS 120B 421 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

Unification of popular artificial neural network activation functions (2302.11007v3)

Published 21 Feb 2023 in cs.LG, cs.AI, cs.NE, and math.FA

Abstract: We present a unified representation of the most popular neural network activation functions. Adopting Mittag-Leffler functions of fractional calculus, we propose a flexible and compact functional form that is able to interpolate between various activation functions and mitigate common problems in training neural networks such as vanishing and exploding gradients. The presented gated representation extends the scope of fixed-shape activation functions to their adaptive counterparts whose shape can be learnt from the training data. The derivatives of the proposed functional form can also be expressed in terms of Mittag-Leffler functions making it a suitable candidate for gradient-based backpropagation algorithms. By training multiple neural networks of different complexities on various datasets with different sizes, we demonstrate that adopting a unified gated representation of activation functions offers a promising and affordable alternative to individual built-in implementations of activation functions in conventional machine learning frameworks.

Summary

The paper presents a unified framework using Mittag-Leffler functions to emulate popular activation functions like ReLU and Sigmoid.
The approach mitigates issues such as vanishing and exploding gradients while ensuring efficient backpropagation via closed differentiation.
Experimental results on datasets from MNIST to ImageNet demonstrate competitive accuracy and improved adaptability in neural network training.

Unification of Popular Artificial Neural Network Activation Functions

The paper "Unification of Popular Artificial Neural Network Activation Functions" by Mohammad Mostafanejad, presents a method for simplifying the exploration and use of activation functions within artificial neural networks (ANNs). By formulating a unified approach with the aid of Mittag-Leffler functions from fractional calculus, the research bridges multiple fixed-form activation functions into a singular representation that offers both interpretability and adaptability.

Key Contributions

The major contribution of this paper lies in the suggested unified framework for activation functions within neural networks. Specifically, the research uses Mittag-Leffler functions to construct a new "gated" representation capable of emulating well-established activation functions such as ReLU, Sigmoid, and others. These functions can often suffer from issues including vanishing/exploding gradients and biases. The unified function seamlessly transitions between these forms based on parameter settings, which mitigates these issues and offers adaptable configurations suited to different data characteristics in training.

Analytical and Numerical Assessment

Theoretical underpinning via fractional calculus is significant in ensuring the derivatives of these functions are computationally tractable for backpropagation algorithms. This closed differentiation is a boon for deploying these unified functions in training deep neural networks efficiently.

A series of experiments were carried out across varying complexities of datasets and network architectures, such as LeNet-5 on MNIST and CIFAR-10, and ShuffleNet-v2 and ResNet-101 on ImageNet-1k. Results demonstrate that the unified activation functions perform competitively with traditional fixed-shape activation functions, delivering comparable accuracy and reducing runtime discrepancies, showcasing minimal performance loss.

Implications and Future Trajectories

From a practical standpoint, this unification reduces the implementation complexity inherent in choosing between numerous activation functions, especially beneficial for architectures requiring adaptable responses to varying data inputs. The technique's suitability for integration with modern machine learning frameworks further proves its practicality.

The paper also opens avenues for a deeper exploration into fractional calculus' role in machine learning, particularly how special functions can unify or even replace conventional methods in neural network design. Future work could enhance this approach by investigating the potential of Mittag-Leffler functions in dynamic networks, possibly extending to recurrent variants and evolving activation landscapes.

In conclusion, this investigation promises valuable advancements in the field by reducing computational burden and enhancing ANN robustness, contributing to the theoretical understanding and practical efficiency of neural network design. As machine learning applications continue to rise in complexity, methodologies promoting adaptability, such as those proposed here, will be increasingly indispensable.