Improving Deep Neural Network with Multiple Parametric Exponential Linear Units (1606.00305v3)

Published 1 Jun 2016 in cs.CV

Abstract: Activation function is crucial to the recent successes of deep neural networks. In this paper, we first propose a new activation function, Multiple Parametric Exponential Linear Units (MPELU), aiming to generalize and unify the rectified and exponential linear units. As the generalized form, MPELU shares the advantages of Parametric Rectified Linear Unit (PReLU) and Exponential Linear Unit (ELU), leading to better classification performance and convergence property. In addition, weight initialization is very important to train very deep networks. The existing methods laid a solid foundation for networks using rectified linear units but not for exponential linear units. This paper complements the current theory and extends it to the wider range. Specifically, we put forward a way of initialization, enabling training of very deep networks using exponential linear units. Experiments demonstrate that the proposed initialization not only helps the training process but leads to better generalization performance. Finally, utilizing the proposed activation function and initialization, we present a deep MPELU residual architecture that achieves state-of-the-art performance on the CIFAR-10/100 datasets. The code is available at https://github.com/Coldmooon/Code-for-MPELU.

Authors (5)

Yang Li (1142 papers)
Chunxiao Fan (5 papers)
Yong Li (628 papers)
Qiong Wu (156 papers)
Yue Ming (3 papers)

Citations (80)

View on Semantic Scholar

Summary

Improving Deep Neural Networks with Multiple Parametric Exponential Linear Units

The paper presents a novel activation function, Multiple Parametric Exponential Linear Units (MPELU), aimed at enhancing the performance and flexibility of deep neural networks. With the objective of bridging the gap between rectified and exponential linear units, MPELU integrates the strengths of Parametric Rectified Linear Unit (PReLU) and Exponential Linear Unit (ELU) through the introduction of learnable parameters. This integration allows MPELU to dynamically adjust between the rectified and exponential regimes, providing it with superior representational power, adaptability, and facilitating faster convergence in deep networks.

Core Contributions

Introduction of MPELU: The authors propose MPELU as a generalized activation function that incorporates learnable parameters, specifically $\alpha$ and $\beta$ . These parameters control the shape both on the linear and non-linear parts of the activation function. MPELU follows:

$f(y_i) = \begin{cases} y_i & \text{if } y_i > 0 \ \alpha_c (e^{\beta_c y_i}-1) & \text{if } y_i \leq 0 \end{cases}$

This structure not only retains the qualities of PReLU and ELU but also encompasses them as special cases, thereby expanding the potential application scenarios.

Weight Initialization Method: The activation also comes with a proposed weight initialization strategy tailored for networks utilizing exponential linear units. This approach is designed to maintain variance across layers, thereby mitigating issues of diminishing gradients typically encountered in deeper architectures.
Empirical Results: The proposed MPELU-enhanced networks demonstrate superior performance across well-known benchmarks like CIFAR-10 and CIFAR-100. The use of MPELU in deep residual network architectures brings state-of-the-art results backed by robust empirical evidence, including analysis of convergence speeds and generalization performance.

Key Findings

Performance and Flexibility: MPELU not only improves classification accuracy but does so across varying depths of neural architectures. Networks utilizing MPELU outperform those using traditional activation functions in terms of both reachability to optimal solutions during training (evidenced by faster convergence) and final generalization performance.
Initialization Advantage: The introduction of the tailored initialization method complements the deployment of MPELU by supporting stable and effective training of extremely deep networks. This addresses a notable gap wherein existing methods inadequately theorized initialization for non-rectified networks.
Robustness with Batch Normalization: Unlike ELU, which can underperform when used with Batch Normalization, MPELU maintains robust classification performance. This outcome impacts practical applications where Batch Normalization is a valuable component in enhancing network training stability.

Implications for AI Development

The design and implementation of MPELU signal a promising advance in refining neural network architectures. This work carries implications for future research focused on bridging the functional gaps between varying types of activation functions. Additionally, the weight initialization strategy provides guidance for designing adaptable, deep architectures that circumvent traditional training difficulties associated with exponential linear units.

Further examination could explore adaptive mechanisms where MPELU parameters dynamically adjust independent of explicit training phases. This direction may unlock further efficiencies in network design, particularly in situations involving transfer learning or continual adaptation to novel data streams.

In conclusion, the paper sheds light on the potential for adaptive, multi-capability activation functions to enhance neural network performance, further convincing the research community of the importance of activation function design in the broader landscape of deep learning advancements.

Related Papers

GitHub

GitHub - Coldmooon/Code-for-MPELU: Code for Improving Deep Neural Network with Multiple Parametric Exponential Linear Units (45 stars)