- The paper introduces using ReLU as both an activation and classification function in DNNs, challenging the conventional use of Softmax.
- Experiments on FFNNs and CNNs with MNIST, Fashion-MNIST, and WDBC show that while FFNN-ReLU achieves comparable results, CNN-ReLU converges slower and records slightly lower accuracy.
- The study highlights benefits such as simplified backpropagation and reduced computational complexity, paving the way for future research using ReLU variants.
Deep Learning using Rectified Linear Units (ReLU)
Abien Fred M. Agarap's paper presents an innovative approach towards using Rectified Linear Units (ReLU) as the classification function in deep neural networks (DNNs). Traditionally, ReLU is used as an activation function in hidden layers, with the Softmax function employed for the final classification layer. This paper proposes an alternative by positioning ReLU at the classification layer and examines its efficacy against Softmax-based methods.
Methodology and Implementation
The paper employs ReLU as both an activation and classification function. The implementation undertakes the common procedure of obtaining the raw scores, denoted as oi, by multiplying the activation of the penultimate layer hn−1 with weight parameters θ. These raw scores are then thresholded using the ReLU function, f(o)=max(0,oi). For the final class prediction y^, the argmax function of the ReLU outputs is utilized.
The experiments were carried out using a feed-forward neural network (FFNN) and a convolutional neural network (CNN). Both architectures were implemented using Keras with a TensorFlow backend. Three datasets were utilized to evaluate the models: MNIST, Fashion-MNIST, and the Wisconsin Diagnostic Breast Cancer (WDBC).
Results
- MNIST and Fashion-MNIST Classification:
- The FFNN-ReLU model exhibited comparable performance to FFNN-Softmax, achieving similar precision, recall, and F1-scores, albeit with slightly lower test accuracy.
- For CNN-based models, however, the ReLU-based classifier showed slower convergence and lower accuracy compared to the Softmax-based counterpart. Despite this, the CNN-ReLU model managed to surpass a 90% test accuracy for both datasets, which demonstrates its viability.
- WDBC Classification:
- The FFNN-ReLU model also yielded comparable results to the FFNN-Softmax, reinforcing the assertion of ReLU-based classifiers being a potential alternative to traditional Softmax classification.
Analysis and Implications
One noteworthy observation was the slower convergence of the ReLU-based CNN models, which impacted their performance compared to the Softmax-based models. This slowdown is attributed to the potential issue of "dying" ReLUs where neurons could get stuck during the learning process and stop updating.
Despite this limitation, the comparative performance of ReLU-based models in FFNN configurations indicates possible advantages, such as simplified backpropagation, reduced computational complexity, and consistency in gradient flow due to linear activation.
Future Directions
The paper suggests exploring enhancements to mitigate dying ReLU neurons, such as the integration of ReLU variants like Parametric ReLU (PReLU) or Leaky ReLU. These variants could potentially overcome the limitations encountered in this paper.
Another area of future research could involve detailed numerical inspection of gradient behavior in DL-ReLU models versus DL-Softmax models. This could provide a deeper understanding of the dynamics involved and enable the development of more robust ReLU-based classifiers.
Conclusion
This paper has contributed valuable insights into the utilization of ReLU as a classification function in deep learning models. It has set the stage for future research to explore and optimize ReLU-based classifiers further. While the traditional Softmax function remains robust and effective, the potential of ReLU in classification, especially given its simplicity and computational benefits, warrants further investigation and could lead to notable advancements in neural network design and efficiency.