FReLU: Flexible Rectified Linear Units for Improving Convolutional Neural Networks (1706.08098v2)

Published 25 Jun 2017 in cs.CV

Abstract: Rectified linear unit (ReLU) is a widely used activation function for deep convolutional neural networks. However, because of the zero-hard rectification, ReLU networks miss the benefits from negative values. In this paper, we propose a novel activation function called \emph{flexible rectified linear unit (FReLU)} to further explore the effects of negative values. By redesigning the rectified point of ReLU as a learnable parameter, FReLU expands the states of the activation output. When the network is successfully trained, FReLU tends to converge to a negative value, which improves the expressiveness and thus the performance. Furthermore, FReLU is designed to be simple and effective without exponential functions to maintain low cost computation. For being able to easily used in various network architectures, FReLU does not rely on strict assumptions by self-adaption. We evaluate FReLU on three standard image classification datasets, including CIFAR-10, CIFAR-100, and ImageNet. Experimental results show that the proposed method achieves fast convergence and higher performances on both plain and residual networks.

Citations (11)

View on Semantic Scholar

Summary

The paper presents FReLU as an extension of ReLU by integrating a learnable bias, enabling the capture of informative negative activations.
Experimental evaluations on CIFAR-10, CIFAR-100, and ImageNet show that FReLU achieves faster convergence and lower error rates compared to traditional ReLU and its variants.
FReLU maintains compatibility with batch normalization and exhibits low computational cost, making it versatile for various CNN architectures.

FReLU: Flexible Rectified Linear Units for Improving Convolutional Neural Networks

The paper introduces Flexible Rectified Linear Units (FReLU), an innovative activation function designed to enhance the performance of convolutional neural networks (CNNs). This work addresses the limitations inherent in the traditional Rectified Linear Unit (ReLU), specifically the issues related to zero-hard rectification and the exclusion of negative value benefits.

Key Contributions

The researchers propose FReLU as an extension of ReLU by introducing a learnable parameter to modulate the rectification point. This innovation aims to harness the information present in the negative spectrum of activations, offering a simple yet computationally efficient solution. The key attributes of FReLU include:

Fast Convergence: FReLU is shown to achieve quicker convergence rates compared to its predecessors.
Low Computational Cost: The absence of exponential functions in FReLU ensures reduced computational expense.
Compatibility with Batch Normalization: Unlike some activation functions such as Exponential Linear Units (ELUs), FReLU maintains compatibility with batch normalization methods.
Self-Adapting Characteristics: FReLU does not depend on strict distributional assumptions, making it versatile across various network architectures.

Methodological Framework

FReLU modifies the ReLU activation function by introducing a learnable bias parameter, allowing the function to transition dynamically between positive and relevant negative values. Mathematically, this is expressed as:

$frelu(x) = relu(x) + b$

where $b$ is a layer-wise adaptable parameter. This construct permits FReLU to generate and utilize negative values effectively, thereby extending the output range from $[0, +\infty)$ to $[b, +\infty)$ , enhancing the expressive power of neural networks without forfeiting the sparse characteristics of ReLU.

Experimental Evaluation

The authors conducted comprehensive evaluations of FReLU using three standard image classification datasets: CIFAR-10, CIFAR-100, and ImageNet. The findings consistently demonstrated that FReLU outperformed traditional ReLU as well as contemporary variants such as Parametric ReLU (PReLU) and Exponential Linear Units (ELUs) across different architectures, including both plain and residual networks.

On deeper analysis, FReLU managed superior results with lower error rates on competitive benchmarks such as CIFAR-100, showcasing higher accuracy rates compared to PReLU and ELU implementations. The architecture adaptability, especially in complex networks with batch normalization, underscores FReLU's practicality in real-world applications.

Implications and Future Directions

The implications of adopting FReLU in CNNs extend into both practical and theoretical domains. Practically, FReLU provides a pathway to more computationally efficient and robust models capable of processing richer data representations through its capacity to include informative negative values.

From a theoretical standpoint, this work challenges the conventional understanding of activation functions, advocating for the reevaluation of the role of negative values in neural network design. Future research could focus on further strategies to tackle the dead neuron problem and explore other innovative activation architectures that balance expressiveness with learning efficiency.

Conclusion

This paper makes significant strides in activation function design by proposing FReLU, which promises notable improvements in terms of convergence speed, performance, and computational efficiency. As neural network architectures continue to evolve, the adaptive and flexible nature of FReLU positions it as a valuable tool in the ongoing enhancement of CNN capabilities. Future investigations are likely to expand upon these foundations, exploring deeper integrations of adaptation mechanisms in activation functions.

PDF Markdown

Related Papers

GitHub

GitHub - ducha-aiki/caffenet-benchmark: Evaluation of the CNN design choices performance on ImageNet-2012. (742 stars)

Tweets

https://twitter.com/ducha_aiki/status/1787581298174832934