Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 71 tok/s

Gemini 2.5 Pro 48 tok/s Pro

GPT-5 Medium 23 tok/s Pro

GPT-5 High 17 tok/s Pro

GPT-4o 111 tok/s Pro

Kimi K2 161 tok/s Pro

GPT OSS 120B 412 tok/s Pro

Claude Sonnet 4 35 tok/s Pro

2000 character limit reached

Deep Learning using Rectified Linear Units (ReLU) (1803.08375v2)

Published 22 Mar 2018 in cs.NE, cs.CV, cs.LG, and stat.ML

Abstract: We introduce the use of rectified linear units (ReLU) as the classification function in a deep neural network (DNN). Conventionally, ReLU is used as an activation function in DNNs, with Softmax function as their classification function. However, there have been several studies on using a classification function other than Softmax, and this study is an addition to those. We accomplish this by taking the activation of the penultimate layer $h_{n - 1}$ in a neural network, then multiply it by weight parameters $\theta$ to get the raw scores $o_{i}$. Afterwards, we threshold the raw scores $o_{i}$ by $0$, i.e. $f(o) = \max(0, o_{i})$, where $f(o)$ is the ReLU function. We provide class predictions $\hat{y}$ through argmax function, i.e. argmax $f(x)$.

Citations (2,893)

View on Semantic Scholar

Summary

The paper introduces using ReLU as both an activation and classification function in DNNs, challenging the conventional use of Softmax.
Experiments on FFNNs and CNNs with MNIST, Fashion-MNIST, and WDBC show that while FFNN-ReLU achieves comparable results, CNN-ReLU converges slower and records slightly lower accuracy.
The study highlights benefits such as simplified backpropagation and reduced computational complexity, paving the way for future research using ReLU variants.

Deep Learning using Rectified Linear Units (ReLU)

Abien Fred M. Agarap's paper presents an innovative approach towards using Rectified Linear Units (ReLU) as the classification function in deep neural networks (DNNs). Traditionally, ReLU is used as an activation function in hidden layers, with the Softmax function employed for the final classification layer. This paper proposes an alternative by positioning ReLU at the classification layer and examines its efficacy against Softmax-based methods.

Methodology and Implementation

The paper employs ReLU as both an activation and classification function. The implementation undertakes the common procedure of obtaining the raw scores, denoted as $o_i$ , by multiplying the activation of the penultimate layer $h_{n-1}$ with weight parameters $\theta$ . These raw scores are then thresholded using the ReLU function, $f(o) = \max(0, o_i)$ . For the final class prediction $\hat{y}$ , the $\argmax$ function of the ReLU outputs is utilized.

The experiments were carried out using a feed-forward neural network (FFNN) and a convolutional neural network (CNN). Both architectures were implemented using Keras with a TensorFlow backend. Three datasets were utilized to evaluate the models: MNIST, Fashion-MNIST, and the Wisconsin Diagnostic Breast Cancer (WDBC).

Results

MNIST and Fashion-MNIST Classification:
- The FFNN-ReLU model exhibited comparable performance to FFNN-Softmax, achieving similar precision, recall, and F1-scores, albeit with slightly lower test accuracy.
- For CNN-based models, however, the ReLU-based classifier showed slower convergence and lower accuracy compared to the Softmax-based counterpart. Despite this, the CNN-ReLU model managed to surpass a 90% test accuracy for both datasets, which demonstrates its viability.
WDBC Classification:
- The FFNN-ReLU model also yielded comparable results to the FFNN-Softmax, reinforcing the assertion of ReLU-based classifiers being a potential alternative to traditional Softmax classification.

Analysis and Implications

One noteworthy observation was the slower convergence of the ReLU-based CNN models, which impacted their performance compared to the Softmax-based models. This slowdown is attributed to the potential issue of "dying" ReLUs where neurons could get stuck during the learning process and stop updating.

Despite this limitation, the comparative performance of ReLU-based models in FFNN configurations indicates possible advantages, such as simplified backpropagation, reduced computational complexity, and consistency in gradient flow due to linear activation.

Future Directions

The paper suggests exploring enhancements to mitigate dying ReLU neurons, such as the integration of ReLU variants like Parametric ReLU (PReLU) or Leaky ReLU. These variants could potentially overcome the limitations encountered in this paper.

Another area of future research could involve detailed numerical inspection of gradient behavior in DL-ReLU models versus DL-Softmax models. This could provide a deeper understanding of the dynamics involved and enable the development of more robust ReLU-based classifiers.

Conclusion

This paper has contributed valuable insights into the utilization of ReLU as a classification function in deep learning models. It has set the stage for future research to explore and optimize ReLU-based classifiers further. While the traditional Softmax function remains robust and effective, the potential of ReLU in classification, especially given its simplicity and computational benefits, warrants further investigation and could lead to notable advancements in neural network design and efficiency.