Swish-T : Enhancing Swish Activation with Tanh Bias for Improved Neural Network Performance

Published 1 Jul 2024 in cs.LG and cs.CV | (2407.01012v3)

Abstract: We propose the Swish-T family, an enhancement of the existing non-monotonic activation function Swish. Swish-T is defined by adding a Tanh bias to the original Swish function. This modification creates a family of Swish-T variants, each designed to excel in different tasks, showcasing specific advantages depending on the application context. The Tanh bias allows for broader acceptance of negative values during initial training stages, offering a smoother non-monotonic curve than the original Swish. We ultimately propose the Swish-T${\textbf{C}}$ function, while Swish-T and Swish-T${\textbf{B}}$, byproducts of Swish-T${\textbf{C}}$, also demonstrate satisfactory performance. Furthermore, our ablation study shows that using Swish-T${\textbf{C}}$ as a non-parametric function can still achieve high performance. The superiority of the Swish-T family has been empirically demonstrated across various models and benchmark datasets, including MNIST, Fashion MNIST, SVHN, CIFAR-10, and CIFAR-100. The code is publicly available at https://github.com/ictseoyoungmin/Swish-T-pytorch.