Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Adaptive Parametric Activation (2407.08567v2)

Published 11 Jul 2024 in cs.CV and cs.LG

Abstract: The activation function plays a crucial role in model optimisation, yet the optimal choice remains unclear. For example, the Sigmoid activation is the de-facto activation in balanced classification tasks, however, in imbalanced classification, it proves inappropriate due to bias towards frequent classes. In this work, we delve deeper in this phenomenon by performing a comprehensive statistical analysis in the classification and intermediate layers of both balanced and imbalanced networks and we empirically show that aligning the activation function with the data distribution, enhances the performance in both balanced and imbalanced tasks. To this end, we propose the Adaptive Parametric Activation (APA) function, a novel and versatile activation function that unifies most common activation functions under a single formula. APA can be applied in both intermediate layers and attention layers, significantly outperforming the state-of-the-art on several imbalanced benchmarks such as ImageNet-LT, iNaturalist2018, Places-LT, CIFAR100-LT and LVIS and balanced benchmarks such as ImageNet1K, COCO and V3DET. The code is available at https://github.com/kostas1515/AGLU.

Citations (1)

Summary

  • The paper introduces the Adaptive Parametric Activation function to dynamically adjust neural activations, addressing biases in imbalanced datasets.
  • Empirical results show APA improves top-1 accuracy by up to 1.4pp on benchmarks like ImageNet-LT and CIFAR100-LT.
  • Theoretical insights reveal that APA generalizes traditional activation functions, enhancing model robustness and convergence.

Adaptive Parametric Activation: An Insightful Overview

The paper "Adaptive Parametric Activation" authored by Konstantinos Panagiotis Alexandridis, Jiankang Deng, Anh Nguyen, and Shan Luo, explores the fundamental role of activation functions in neural networks, particularly focusing on their impact in handling balanced and imbalanced image classification tasks. The novelty of this paper lies in the introduction of the Adaptive Parametric Activation (APA) function, which aims to rectify the inherent biases presented by traditional activation functions such as Sigmoid when dealing with imbalanced datasets.

Introduction

Activation functions are critical components in neural networks that aid in capturing non-linearities in data. However, their efficacy varies significantly based on data distribution specifics. The Sigmoid function, for instance, which works well for balanced datasets, shows significant performance drops when dealing with imbalanced datasets due to its bias towards frequent classes. This motivates the need for an adaptable activation function that aligns with data distributions across various scenarios.

The Adaptive Parametric Activation (APA) Function

APA is introduced as a versatile activation function that subsumes various commonly used activation functions under a single framework. This flexibility is achieved through parameterisation, enabling APA to adjust its behavior dynamically.

Theoretical Underpinning

The authors present a theoretical foundation that views activation functions as enforcing a prior belief about data distributions. For example, using Sigmoid assumes a logistic distribution of errors. The APA function generalises this by introducing parameters that can adapt to different distributions, thus enhancing convergence and performance across both balanced and imbalanced datasets.

Empirical Validation

Through extensive empirical analysis, the authors show that data imbalance significantly affects both classification logits and intermediate layer activations. For example, logit distributions align more closely with the logistic distribution in balanced training and with the Gumbel distribution in imbalanced scenarios. Similar disparities are observed in intermediate layers — attention mechanisms show a bias towards frequent classes in imbalanced training setups.

Numerical Results

APA demonstrated its superiority across several benchmarks:

  • ImageNet-LT: APA achieved an average top-1 accuracy improvement of 1.4pp over the SE-ResNet50 baseline and outperformed other state-of-the-art methods such as RIDE and DOC in handling frequent and rare classes.
  • iNaturalist and Places-LT: APA improved the SE baseline's performance by 1.0pp and 0.8pp respectively, and when combined with AGLU, it further boosted performance significantly.
  • CIFAR100-LT: APA increased the accuracy by 1.0pp for an imbalance factor of 100 and retained robust performance across different imbalance factors.

Practical and Theoretical Implications

The ability of APA to adapt to the underlying data distribution makes it a promising tool in various neural network applications beyond image classification. Theoretically, it challenges the traditional fixed-choice activation functions paradigm, proposing a more flexible, data-driven approach. Practically, APA enhances model robustness and generalisability, particularly in scenarios involving class imbalance — a common occurrence in real-world datasets.

Future Developments

The introduction of APA opens various avenues for future research. One potential development is the exploration of APA in diverse neural network architectures, including Transformers and attention-based models. Additionally, fine-tuning the parametric form of APA for specific domains such as NLP, where different data distribution characteristics are prevalent, could yield substantial benefits.

Conclusion

The paper presents a well-founded argument supported by strong empirical evidence that highlights the limitations of traditional activation functions in imbalanced data scenarios. By introducing APA, the authors contribute a significant advancement in neural network optimization, with broad implications for both theoretical research and practical applications in AI. This adaptable activation function paves the way for more resilient and accurate neural networks capable of handling the challenges posed by varying data distributions.

X Twitter Logo Streamline Icon: https://streamlinehq.com