Activation Functions: Comparison of trends in Practice and Research for Deep Learning

Published 8 Nov 2018 in cs.LG and cs.CV | (1811.03378v1)

Abstract: Deep neural networks have been successfully used in diverse emerging domains to solve real world complex problems with may more deep learning(DL) architectures, being developed to date. To achieve these state-of-the-art performances, the DL architectures use activation functions (AFs), to perform diverse computations between the hidden layers and the output layers of any given DL architecture. This paper presents a survey on the existing AFs used in deep learning applications and highlights the recent trends in the use of the activation functions for deep learning applications. The novelty of this paper is that it compiles majority of the AFs used in DL and outlines the current trends in the applications and usage of these functions in practical deep learning deployments against the state-of-the-art research results. This compilation will aid in making effective decisions in the choice of the most suitable and appropriate activation function for any given application, ready for deployment. This paper is timely because most research papers on AF highlights similar works and results while this paper will be the first, to compile the trends in AF applications in practice against the research results from literature, found in deep learning research to date.

Abstract PDF Upgrade to Chat

Citations (1,208)

View on Semantic Scholar

Summary

The paper's main contribution is a comprehensive survey showing that established activation functions like ReLU and Softmax dominate in practice over novel alternatives.
The paper examines various activation functions through detailed analysis of their mathematical formulations and computational efficiencies, offering actionable insights for model selection.
The study highlights that integrating adaptive and complex activation functions in future research may enhance deep learning network performance.

A Comparative Study of Activation Functions in Deep Learning

The paper "Activation Functions: Comparison of Trends in Practice and Research for Deep Learning" by Chigozie Enyinna Nwankpa, Winifred Ijomah, Anthony Gachagan, and Stephen Marshall offers a comprehensive review and comparative analysis of activation functions (AFs) employed in deep learning (DL) architectures. This paper not only surveys the existing AFs but also juxtaposes their usage in practical DL deployments against the state-of-the-art research outcomes.

Overview

The research contextualizes activation functions' critical role in transforming raw input data into higher-level abstract representations within deep neural networks (DNNs). These functions are pivotal for introducing non-linearity into the model, which is fundamental to solve complex problems across various domains such as classification, detection, and segmentation.

Activation Functions (AFs) Explored

The paper categorizes and examines numerous AFs and their variants, emphasizing their mathematical formulation, operational specifics, and application scenarios:

Sigmoid and its Variants:
- Sigmoid: Defined by $\sigma(x) = \frac{1}{1 + e^{-x}}$ , commonly used in binary classification.
- Hard Sigmoid: Offers reduced computational cost due to its simpler form $\text{hard\_sigmoid}(x) = \text{clip}(\frac{x + 1}{2}, 0, 1)$ .
- SiLU and dSiLU: Combine sigmoid with linear properties to enhance reinforcement learning applications.
Hyperbolic Tangent (Tanh) and Hard Tanh:
- Tanh: $\tanh(x) = \frac{e^x - e^{-x}}{e^x + e^{-x}}$ , preferable in scenarios requiring zero-centered output for faster convergence.
- Hard Tanh: Simplified computationally efficient variant of tanh.
Softmax and Softsign:
- Softmax: Vital in multi-class classification with the formula $\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}}$ .
- Softsign: Offers polynomial convergence, represented by $\text{softsign}(x) = \frac{x}{|x| + 1}$ .
Rectified Linear Unit (ReLU) and its Variants:
- ReLU: $\text{relu}(x) = \max(0, x)$ , widely adopted for its simplicity and effectiveness in mitigating the vanishing gradient problem.
- Variants: Include Leaky ReLU (LReLU), Parametric ReLU (PReLU), and Randomized Leaky ReLU (RReLU), each adding modifications such as adaptive parameters to address drawbacks like dead neurons.
Exponential Linear Unit (ELU) and its Variants:
- ELU: $\text{elu}(x) = x$ if $x > 0$ else $\alpha (e^x - 1)$ , provides faster learning and reduces bias shifts.
- PELU and SELU: Introduce parameters and scaling properties, respectively, to enhance self-normalizing capabilities.

Trends in Practice

The survey outlines that despite the plethora of novel AFs proposed in the literature, practical deep learning applications predominantly utilize the ReLU and Softmax functions. This is evidenced by their adoption in several winning architectures of the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), such as AlexNet, VGGNet, GoogleNet, and ResNet. The ReLU function is prevalent in hidden layers due to its computational efficiency and ability to prevent the vanishing gradient problem, whereas Softmax remains the standard for output layers in classification networks.

Practical and Theoretical Implications

The continuous innovation in activation functions aims to address the persistent issues of vanishing and exploding gradients, improve learning efficiency, and enhance generalization capabilities. The compounds AFs like Swish and recent entries like ELiSH signify an evolution towards more complex and adaptive functions capable of learning intricate data representations.

Future research is anticipated to explore the integration of newly developed AFs within existing state-of-the-art architectures, potentially optimizing their performance further. This investigation could involve empirical validations across diverse datasets and DL models to substantiate the theoretical advancements proposed.

Conclusion

This paper serves as an essential resource by cataloging a wide array of activation functions, analyzing their trends in practical deployment, and surveying their theoretical advancements. While newer AFs show promise, practical applications continue to rely on the robustness and reliability of established functions like ReLU and Softmax, underscoring the cautious adoption of novel AFs in real-world DL applications. Future research is poised to uncover potential gains from next-generation AFs, driving forward the capabilities of deep learning systems.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Generate Now

Continue Learning

Authors (4)

Collections

YouTube

Show All Videos

Activation Functions: Comparison of trends in Practice and Research for Deep Learning

Summary

A Comparative Study of Activation Functions in Deep Learning

Overview

Activation Functions (AFs) Explored

Trends in Practice

Practical and Theoretical Implications

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (4)

Collections

YouTube