Papers

Topics

Authors

Recent

View all

Detailed Answer

Quick Answer

Concise responses based on abstracts only

Detailed Answer

Well-researched responses based on abstracts and relevant paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses

Gemini 2.5 Flash

Gemini 2.5 Flash 77 tok/s

Gemini 2.5 Pro 33 tok/s Pro

GPT-5 Medium 25 tok/s Pro

GPT-5 High 27 tok/s Pro

GPT-4o 75 tok/s Pro

Kimi K2 220 tok/s Pro

GPT OSS 120B 465 tok/s Pro

Claude Sonnet 4 36 tok/s Pro

2000 character limit reached

LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks (1901.05894v4)

Published 1 Jan 2019 in cs.CV

Abstract: The activation function in neural network introduces the non-linearity required to deal with the complex tasks. Several activation/non-linearity functions are developed for deep learning models. However, most of the existing activation functions suffer due to the dying gradient problem and non-utilization of the large negative input values. In this paper, we propose a Linearly Scaled Hyperbolic Tangent (LiSHT) for Neural Networks (NNs) by scaling the Tanh linearly. The proposed LiSHT is non-parametric and tackles the dying gradient problem. We perform the experiments on benchmark datasets of different type, such as vector data, image data and natural language data. We observe the superior performance using Multi-layer Perceptron (MLP), Residual Network (ResNet) and Long-short term memory (LSTM) for data classification, image classification and tweets classification tasks, respectively. The accuracy on CIFAR100 dataset using ResNet model with LiSHT is improved by 9.48, 3.40, 3.16, 4.26, and 1.17\% as compared to Tanh, ReLU, PReLU, LReLU, and Swish, respectively. We also show the qualitative results using loss landscape, weight distribution and activations maps in support of the proposed activation function.

Citations (48)

View on Semantic Scholar

Collections

Summary

An Evaluation of the LiSHT Activation Function for Neural Networks

The paper "LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks" addresses a critical challenge in neural network training, namely the issues arising from activation functions that suffer from dying gradient problems and the non-utilization of significant negative input values. The authors propose a novel activation function named LiSHT (Linearly Scaled Hyperbolic Tangent) and investigate its effectiveness in comparison to several well-known activation functions including Tanh, ReLU, PReLU, LReLU, and Swish.

LiSHT Activation Function

LiSHT is introduced as a non-parametric activation function derived by linearly scaling the traditional Tanh function. This approach aims to offer a solution to the dying gradient issue and enhance the utilization of negative input values, facilitating more effective learning in neural networks. The function retains symmetry and smoothing properties while significantly increasing the non-linear characteristics essential for deep learning tasks.

Experimental Results

The paper presents robust empirical evidence demonstrating LiSHT's performance across various data types, including vector data, image data, and natural language data. Specifically:

Image Classification on CIFAR-100: When integrated into a Residual Network (ResNet), LiSHT enhances performance by improving accuracy by 9.48% compared to Tanh, 3.40% compared to ReLU, 3.16% compared to PReLU, 4.26% compared to LReLU, and 1.17% compared to Swish.
Vector Data and Natural Language Processing: Experiments using Multi-layer Perceptron (MLP) and Long-Short Term Memory (LSTM) networks show that LiSHT achieves superior results in classification tasks, consistently outperforming other activation methods.

Analysis of Activation Maps and Weight Distributions

The activation feature maps generated using LiSHT demonstrate a reduced number of non-learnable filters, indicating successful mitigation of dying neuron problems. Additionally, the weight distribution analysis from trained networks reveals LiSHT's ability to promote more symmetric and exploratory learning by allowing weights to traverse both positive and negative regions effectively.

Implications and Future Directions

The introduction of LiSHT highlights practical implications in deep learning architecture design, offering a promising direction for enhancing model training efficiency and improving classification accuracy across diverse domains. The unbounded nature of LiSHT, coupled with its symmetrical and non-monotonic properties, provides insights into developing even more adaptive and efficient activation functions for future neural network models.

Moreover, the paper's findings stimulate further exploration into activation function design, potentially encouraging the integration of LiSHT with other emerging AI technologies and models. As neural networks continue to evolve, the principles underlying LiSHT could inspire new methodologies aimed at overcoming limitations inherent in current activation mechanisms.

In conclusion, LiSHT represents a promising advancement in the field of neural network activation functions, showing significant improvement over conventional methods. Its application across multiple benchmark datasets and model architectures paves the way for refined activation function strategies that could drive more effective and robust deep learning systems in varied applications.